On 6/28/22 19:23, Sam Li wrote: > Damien Le Moal <damien.lem...@opensource.wdc.com> 于2022年6月28日周二 17:05写道: >> >> On 6/28/22 17:05, Sam Li wrote: >>> Stefan Hajnoczi <stefa...@redhat.com> 于2022年6月28日周二 14:52写道: >>>> >>>> On Mon, Jun 27, 2022 at 08:19:13AM +0800, Sam Li wrote: >>>>> diff --git a/block/block-backend.c b/block/block-backend.c >>>>> index e0e1aff4b1..786f964d02 100644 >>>>> --- a/block/block-backend.c >>>>> +++ b/block/block-backend.c >>>>> @@ -1810,6 +1810,62 @@ int blk_flush(BlockBackend *blk) >>>>> return ret; >>>>> } >>>>> >>>>> +/* >>>>> + * Return zone_report from BlockDriver. Offset can be any number within >>>>> + * the zone size. No alignment for offset and len. >>>> >>>> What is the purpose of len? Is it the maximum number of zones to return >>>> in nr_zones[]? >>> >>> len is actually not used in bdrv_co_zone_report. It is needed by >>> blk_check_byte_request. >> >> There is no IO buffer being passed. Only an array of zone descriptors and >> an in-out number of zones. len is definitely not needed. Not sure what >> blk_check_byte_request() is intended to check though. > > Can I just remove len argument and blk_check_byte_request() if it's not used?
If it is unused, yes, you must remove it. > >> >>> >>>> How does the caller know how many zones were returned? >>> >>> nr_zones represents IN maximum and OUT actual. The caller will know by >>> nr_zones which is changed in bdrv_co_zone_report. I'll add it in the >>> comments. >>> >>>> >>>>> + */ >>>>> +int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset, >>>>> + int64_t len, int64_t *nr_zones, >>>>> + BlockZoneDescriptor *zones) >>>>> +{ >>>>> + int ret; >>>>> + BlockDriverState *bs; >>>>> + IO_CODE(); >>>>> + >>>>> + blk_inc_in_flight(blk); /* increase before waiting */ >>>>> + blk_wait_while_drained(blk); >>>>> + bs = blk_bs(blk); >>>>> + >>>>> + ret = blk_check_byte_request(blk, offset, len); >>>>> + if (ret < 0) { >>>>> + return ret; >>>>> + } >>>>> + >>>>> + bdrv_inc_in_flight(bs); >>>> >>>> The bdrv_inc/dec_in_flight() call should be inside >>>> bdrv_co_zone_report(). See bdrv_co_ioctl() for an example. >>>> >>>>> + ret = bdrv_co_zone_report(blk->root->bs, offset, len, >>>>> + nr_zones, zones); >>>>> + bdrv_dec_in_flight(bs); >>>>> + blk_dec_in_flight(blk); >>>>> + return ret; >>>>> +} >>>>> + >>>>> +/* >>>>> + * Return zone_mgmt from BlockDriver. >>>> >>>> Maybe this should be: >>>> >>>> Send a zone management command. >>> >>> Yes, it's more accurate. >>> >>>> >>>>> @@ -216,6 +217,11 @@ typedef struct RawPosixAIOData { >>>>> PreallocMode prealloc; >>>>> Error **errp; >>>>> } truncate; >>>>> + struct { >>>>> + int64_t *nr_zones; >>>>> + BlockZoneDescriptor *zones; >>>>> + } zone_report; >>>>> + zone_op op; >>>> >>>> It's cleaner to put op inside a struct zone_mgmt so its purpose is >>>> self-explanatory: >>>> >>>> struct { >>>> zone_op op; >>>> } zone_mgmt; >>>> >>>>> +static int handle_aiocb_zone_report(void *opaque) { >>>>> + RawPosixAIOData *aiocb = opaque; >>>>> + int fd = aiocb->aio_fildes; >>>>> + int64_t *nr_zones = aiocb->zone_report.nr_zones; >>>>> + BlockZoneDescriptor *zones = aiocb->zone_report.zones; >>>>> + int64_t offset = aiocb->aio_offset; >>>>> + int64_t len = aiocb->aio_nbytes; >> >> Since you have the zone array and number of zones (size of that array) I >> really do not see why you need len. >> >>>>> + >>>>> + struct blk_zone *blkz; >>>>> + int64_t rep_size, nrz; >>>>> + int ret, n = 0, i = 0; >>>>> + >>>>> + nrz = *nr_zones; >>>>> + if (len == -1) { >>>>> + return -errno; >>>> >>>> Where is errno set? Should this be an errno constant instead like >>>> -EINVAL? >>> >>> That's right! Noted. >>> >>>> >>>>> + } >>>>> + rep_size = sizeof(struct blk_zone_report) + nrz * sizeof(struct >>>>> blk_zone); >>>>> + g_autofree struct blk_zone_report *rep = g_new(struct >>>>> blk_zone_report, nrz); >>>> >>>> g_new() looks incorrect. There should be 1 struct blk_zone_report >>>> followed by nrz struct blk_zone structs. Please use g_malloc(rep_size) >>>> instead. >>> >>> Yes! However, it still has a memory leak error when using g_autofree >>> && g_malloc. >> >> That may be because you are changing the value of the rep pointer while >> parsing the report ? > > I am not sure it is the case. Can you show me some way to find the problem? Not sure. I never used this g_malloc()/g_autofree() before so not sure how it works. It may be that g_autofree() work only with g_new() ? Could you try separating the declaration and allocation ? e.g. Declare at the beginning of the function: g_autofree struct blk_zone_report *rep = NULL; And then when needed do: rep_size = sizeof(struct blk_zone_report) + nrz * sizeof(struct blk_zone); rep = g_malloc(rep_size); > > Thanks for reviewing! > > >> >>> >>>> >>>>> + offset = offset / 512; /* get the unit of the start sector: sector >>>>> size is 512 bytes. */ >>>>> + printf("start to report zone with offset: 0x%lx\n", offset); >>>>> + >>>>> + blkz = (struct blk_zone *)(rep + 1); >>>>> + while (n < nrz) { >>>>> + memset(rep, 0, rep_size); >>>>> + rep->sector = offset; >>>>> + rep->nr_zones = nrz; >>>> >>>> What prevents zones[] overflowing? nrz isn't being reduced in the loop, >>>> so maybe the rep->nr_zones return value will eventually exceed the >>>> number of elements still available in zones[n..]? >>> >>> I suppose the number of zones[] is restricted in the subsequent for >>> loop where zones[] copy one zone at a time as n increases. Even if >>> rep->zones exceeds the available room in zones[], the extra zone will >>> not be copied. >>> >>>> >>>>> +static int handle_aiocb_zone_mgmt(void *opaque) { >>>>> + RawPosixAIOData *aiocb = opaque; >>>>> + int fd = aiocb->aio_fildes; >>>>> + int64_t offset = aiocb->aio_offset; >>>>> + int64_t len = aiocb->aio_nbytes; >>>>> + zone_op op = aiocb->op; >>>>> + >>>>> + struct blk_zone_range range; >>>>> + const char *ioctl_name; >>>>> + unsigned long ioctl_op; >>>>> + int64_t zone_size; >>>>> + int64_t zone_size_mask; >>>>> + int ret; >>>>> + >>>>> + ret = ioctl(fd, BLKGETZONESZ, &zone_size); >>>> >>>> Can this value be stored in bs (maybe in BlockLimits) to avoid calling >>>> ioctl(BLKGETZONESZ) each time? >>> >>> Yes, zone_size is in the zbd_dev field. I'll update BlockLimits after >>> I think through the configurations. In addition, it's a temporary >>> approach. It is substituted by get_sysfs_long_val now. >>> >>>> >>>>> + if (ret) { >>>>> + return -1; >>>> >>>> The return value should be a negative errno. -1 is -EPERM but it's >>>> probably not that error code you wanted. This should be: >>>> >>>> return -errno; >>>> >>>>> + } >>>>> + >>>>> + zone_size_mask = zone_size - 1; >>>>> + if (offset & zone_size_mask) { >>>>> + error_report("offset is not the start of a zone"); >>>>> + return -1; >>>> >>>> return -EINVAL; >>>> >>>>> + } >>>>> + >>>>> + if (len & zone_size_mask) { >>>>> + error_report("len is not aligned to zones"); >>>>> + return -1; >>>> >>>> return -EINVAL; >>>> >>>>> +static int coroutine_fn raw_co_zone_report(BlockDriverState *bs, int64_t >>>>> offset, >>>>> + int64_t len, int64_t *nr_zones, >>>>> + BlockZoneDescriptor *zones) { >>>>> + BDRVRawState *s = bs->opaque; >>>>> + RawPosixAIOData acb; >>>>> + >>>>> + acb = (RawPosixAIOData) { >>>>> + .bs = bs, >>>>> + .aio_fildes = s->fd, >>>>> + .aio_type = QEMU_AIO_IOCTL, >>>>> + .aio_offset = offset, >>>>> + .aio_nbytes = len, >>>>> + .zone_report = { >>>>> + .nr_zones = nr_zones, >>>>> + .zones = zones, >>>>> + }, >>>> >>>> Indentation is off here. Please use 4-space indentation. >>> Noted! >>> >>> Thanks for reviewing! >>> >>> Sam >> >> >> -- >> Damien Le Moal >> Western Digital Research -- Damien Le Moal Western Digital Research