On 6/28/22 17:05, Sam Li wrote: > Stefan Hajnoczi <stefa...@redhat.com> 于2022年6月28日周二 14:52写道: >> >> On Mon, Jun 27, 2022 at 08:19:13AM +0800, Sam Li wrote: >>> diff --git a/block/block-backend.c b/block/block-backend.c >>> index e0e1aff4b1..786f964d02 100644 >>> --- a/block/block-backend.c >>> +++ b/block/block-backend.c >>> @@ -1810,6 +1810,62 @@ int blk_flush(BlockBackend *blk) >>> return ret; >>> } >>> >>> +/* >>> + * Return zone_report from BlockDriver. Offset can be any number within >>> + * the zone size. No alignment for offset and len. >> >> What is the purpose of len? Is it the maximum number of zones to return >> in nr_zones[]? > > len is actually not used in bdrv_co_zone_report. It is needed by > blk_check_byte_request.
There is no IO buffer being passed. Only an array of zone descriptors and an in-out number of zones. len is definitely not needed. Not sure what blk_check_byte_request() is intended to check though. > >> How does the caller know how many zones were returned? > > nr_zones represents IN maximum and OUT actual. The caller will know by > nr_zones which is changed in bdrv_co_zone_report. I'll add it in the > comments. > >> >>> + */ >>> +int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset, >>> + int64_t len, int64_t *nr_zones, >>> + BlockZoneDescriptor *zones) >>> +{ >>> + int ret; >>> + BlockDriverState *bs; >>> + IO_CODE(); >>> + >>> + blk_inc_in_flight(blk); /* increase before waiting */ >>> + blk_wait_while_drained(blk); >>> + bs = blk_bs(blk); >>> + >>> + ret = blk_check_byte_request(blk, offset, len); >>> + if (ret < 0) { >>> + return ret; >>> + } >>> + >>> + bdrv_inc_in_flight(bs); >> >> The bdrv_inc/dec_in_flight() call should be inside >> bdrv_co_zone_report(). See bdrv_co_ioctl() for an example. >> >>> + ret = bdrv_co_zone_report(blk->root->bs, offset, len, >>> + nr_zones, zones); >>> + bdrv_dec_in_flight(bs); >>> + blk_dec_in_flight(blk); >>> + return ret; >>> +} >>> + >>> +/* >>> + * Return zone_mgmt from BlockDriver. >> >> Maybe this should be: >> >> Send a zone management command. > > Yes, it's more accurate. > >> >>> @@ -216,6 +217,11 @@ typedef struct RawPosixAIOData { >>> PreallocMode prealloc; >>> Error **errp; >>> } truncate; >>> + struct { >>> + int64_t *nr_zones; >>> + BlockZoneDescriptor *zones; >>> + } zone_report; >>> + zone_op op; >> >> It's cleaner to put op inside a struct zone_mgmt so its purpose is >> self-explanatory: >> >> struct { >> zone_op op; >> } zone_mgmt; >> >>> +static int handle_aiocb_zone_report(void *opaque) { >>> + RawPosixAIOData *aiocb = opaque; >>> + int fd = aiocb->aio_fildes; >>> + int64_t *nr_zones = aiocb->zone_report.nr_zones; >>> + BlockZoneDescriptor *zones = aiocb->zone_report.zones; >>> + int64_t offset = aiocb->aio_offset; >>> + int64_t len = aiocb->aio_nbytes; Since you have the zone array and number of zones (size of that array) I really do not see why you need len. >>> + >>> + struct blk_zone *blkz; >>> + int64_t rep_size, nrz; >>> + int ret, n = 0, i = 0; >>> + >>> + nrz = *nr_zones; >>> + if (len == -1) { >>> + return -errno; >> >> Where is errno set? Should this be an errno constant instead like >> -EINVAL? > > That's right! Noted. > >> >>> + } >>> + rep_size = sizeof(struct blk_zone_report) + nrz * sizeof(struct >>> blk_zone); >>> + g_autofree struct blk_zone_report *rep = g_new(struct blk_zone_report, >>> nrz); >> >> g_new() looks incorrect. There should be 1 struct blk_zone_report >> followed by nrz struct blk_zone structs. Please use g_malloc(rep_size) >> instead. > > Yes! However, it still has a memory leak error when using g_autofree > && g_malloc. That may be because you are changing the value of the rep pointer while parsing the report ? > >> >>> + offset = offset / 512; /* get the unit of the start sector: sector >>> size is 512 bytes. */ >>> + printf("start to report zone with offset: 0x%lx\n", offset); >>> + >>> + blkz = (struct blk_zone *)(rep + 1); >>> + while (n < nrz) { >>> + memset(rep, 0, rep_size); >>> + rep->sector = offset; >>> + rep->nr_zones = nrz; >> >> What prevents zones[] overflowing? nrz isn't being reduced in the loop, >> so maybe the rep->nr_zones return value will eventually exceed the >> number of elements still available in zones[n..]? > > I suppose the number of zones[] is restricted in the subsequent for > loop where zones[] copy one zone at a time as n increases. Even if > rep->zones exceeds the available room in zones[], the extra zone will > not be copied. > >> >>> +static int handle_aiocb_zone_mgmt(void *opaque) { >>> + RawPosixAIOData *aiocb = opaque; >>> + int fd = aiocb->aio_fildes; >>> + int64_t offset = aiocb->aio_offset; >>> + int64_t len = aiocb->aio_nbytes; >>> + zone_op op = aiocb->op; >>> + >>> + struct blk_zone_range range; >>> + const char *ioctl_name; >>> + unsigned long ioctl_op; >>> + int64_t zone_size; >>> + int64_t zone_size_mask; >>> + int ret; >>> + >>> + ret = ioctl(fd, BLKGETZONESZ, &zone_size); >> >> Can this value be stored in bs (maybe in BlockLimits) to avoid calling >> ioctl(BLKGETZONESZ) each time? > > Yes, zone_size is in the zbd_dev field. I'll update BlockLimits after > I think through the configurations. In addition, it's a temporary > approach. It is substituted by get_sysfs_long_val now. > >> >>> + if (ret) { >>> + return -1; >> >> The return value should be a negative errno. -1 is -EPERM but it's >> probably not that error code you wanted. This should be: >> >> return -errno; >> >>> + } >>> + >>> + zone_size_mask = zone_size - 1; >>> + if (offset & zone_size_mask) { >>> + error_report("offset is not the start of a zone"); >>> + return -1; >> >> return -EINVAL; >> >>> + } >>> + >>> + if (len & zone_size_mask) { >>> + error_report("len is not aligned to zones"); >>> + return -1; >> >> return -EINVAL; >> >>> +static int coroutine_fn raw_co_zone_report(BlockDriverState *bs, int64_t >>> offset, >>> + int64_t len, int64_t *nr_zones, >>> + BlockZoneDescriptor *zones) { >>> + BDRVRawState *s = bs->opaque; >>> + RawPosixAIOData acb; >>> + >>> + acb = (RawPosixAIOData) { >>> + .bs = bs, >>> + .aio_fildes = s->fd, >>> + .aio_type = QEMU_AIO_IOCTL, >>> + .aio_offset = offset, >>> + .aio_nbytes = len, >>> + .zone_report = { >>> + .nr_zones = nr_zones, >>> + .zones = zones, >>> + }, >> >> Indentation is off here. Please use 4-space indentation. > Noted! > > Thanks for reviewing! > > Sam -- Damien Le Moal Western Digital Research