On Mon, Jan 30, 2023 at 06:30:16PM +0000, Daniel P. Berrangé wrote: > On Mon, Jan 30, 2023 at 10:17:48AM -0500, Stefan Hajnoczi wrote: > > On Mon, 30 Jan 2023 at 07:33, Daniel P. Berrangé <berra...@redhat.com> > > wrote: > > > > > > On Sun, Jan 29, 2023 at 06:39:49PM +0800, Sam Li wrote: > > > > This patch extends virtio-blk emulation to handle zoned device commands > > > > by calling the new block layer APIs to perform zoned device I/O on > > > > behalf of the guest. It supports Report Zone, four zone oparations > > > > (open, > > > > close, finish, reset), and Append Zone. > > > > > > > > The VIRTIO_BLK_F_ZONED feature bit will only be set if the host does > > > > support zoned block devices. Regular block devices(conventional zones) > > > > will not be set. > > > > > > > > The guest os can use blktests, fio to test those commands on zoned > > > > devices. > > > > Furthermore, using zonefs to test zone append write is also supported. > > > > > > > > Signed-off-by: Sam Li <faithilike...@gmail.com> > > > > --- > > > > hw/block/virtio-blk-common.c | 2 + > > > > hw/block/virtio-blk.c | 394 +++++++++++++++++++++++++++++++++++ > > > > 2 files changed, 396 insertions(+) > > > > > > > > > > > @@ -949,6 +1311,30 @@ static void virtio_blk_update_config(VirtIODevice > > > > *vdev, uint8_t *config) > > > > blkcfg.write_zeroes_may_unmap = 1; > > > > virtio_stl_p(vdev, &blkcfg.max_write_zeroes_seg, 1); > > > > } > > > > + if (bs->bl.zoned != BLK_Z_NONE) { > > > > + switch (bs->bl.zoned) { > > > > + case BLK_Z_HM: > > > > + blkcfg.zoned.model = VIRTIO_BLK_Z_HM; > > > > + break; > > > > + case BLK_Z_HA: > > > > + blkcfg.zoned.model = VIRTIO_BLK_Z_HA; > > > > + break; > > > > + default: > > > > + g_assert_not_reached(); > > > > + } > > > > + > > > > + virtio_stl_p(vdev, &blkcfg.zoned.zone_sectors, > > > > + bs->bl.zone_size / 512); > > > > + virtio_stl_p(vdev, &blkcfg.zoned.max_active_zones, > > > > + bs->bl.max_active_zones); > > > > + virtio_stl_p(vdev, &blkcfg.zoned.max_open_zones, > > > > + bs->bl.max_open_zones); > > > > + virtio_stl_p(vdev, &blkcfg.zoned.write_granularity, blk_size); > > > > + virtio_stl_p(vdev, &blkcfg.zoned.max_append_sectors, > > > > + bs->bl.max_append_sectors); > > > > > > So these are all ABI sensitive frontend device settings, but they are > > > not exposed as tunables on the virtio-blk device, instead they are > > > implicitly set from the backend. > > > > > > We have done this kind of thing before in QEMU, but several times it > > > has bitten QEMU maintainers/users, as having a backend affect the > > > frontend ABI is not to typical. It wouldn't be immediately obvious > > > when starting QEMU on a target host that the live migration would > > > be breaking ABI if the target host wasn't using a zoned device with > > > exact same settings. > > > > > > This also limits mgmt flexibility across live migration, if the > > > mgmt app wants/needs to change the storage backend. eg maybe they > > > need to evacuate the host for an emergency, but don't have spare > > > hosts with same kind of storage. It might be desirable to migrate > > > and switch to a plain block device or raw/qcow2 file, rather than > > > let the VM die. > > > > > > Can we make these virtio setting be explicitly controlled on the > > > virtio-blk device. If not specified explicitly they could be > > > auto-populated from the backend for ease of use, but if specified > > > then simply validate the backend is a match. libvirt would then > > > make sure these are always explicitly set on the frontend. > > > > I think this is a good idea, especially if we streamline the > > file-posix.c driver by merging --blockdev zoned_host_device into > > --blockdev host_device. It won't be obvious from the command-line > > whether this is a zoned or non-zoned device. There should be a > > --device virtio-blk-pci,drive=drive0,zoned=on option that fails when > > drive0 isn't zoned. It should probably be on/off/auto where auto is > > the default and doesn't check anything, on requires a zoned device, > > and off requires a non-zoned device. That will prevent accidental > > migration between zoned/non-zoned devices. > > > > I want to point out that virtio-blk doesn't have checks for the disk > > size or other details, so what you're suggesting for zone_sectors, etc > > is stricter than what QEMU does today. Since the virtio-blk parameters > > you're proposing are optional, I think it doesn't hurt though. > > Yeah, it is slightly different than some of the parameters handling. > I guess you could say that with disk capacity, matching size is a > fairly obvious constraint/expectation to manage, and also long standing. > > With disk capacity, you can add the 'raw' driver on top of any block > driver stack, to apply an arbitrary offset+size, to make the storage > smaller than it otherwise is on disk. Conceptually than could have > been done on the frontend device(s) too, but I guess it made more > sense to do it in the block layer to give consistent enforcement > of the limits across frontends. It is fuzzy whether such a use of > the 'raw' driver is really considered backend config, as opposed to > frontend config but to me it feels likle frontend config. > > You could possibly come up with the concept of a 'zoned' format that > can be layered on top of a block driver stack to add zoned I/O constraints > for sake of compatibility, where none otherwise exists in the physical > storage. Possibly useful if multiple frontends all support zoned storage, > to avoid duplicating the constraints across all ?
Maybe: DEFINE_BLOCK_ZONED_PROPERTIES(VirtIOBlock, conf.conf), and then: bool blkconf_check_zoned_properties(BlockBackend *blk, BlockZonedConf *conf, Error **errp); That macro and helper function can be shared by all emulated storage controllers that implement zoned storage. However, there's one problem: some storage interfaces extend the zoned storage model (e.g. NVMe ZNS seems to have functionality that's not available elsewhere). It would be necessary to check whether there is a common subset of parameters with matching property names (because terminology could be different) across emulated storage controllers. But I think it's likely that this will work. I think the macro and helper function approach is nice because it's internal to QEMU and users don't need to set up a --blockdev enforce-zoned. Stefan
signature.asc
Description: PGP signature