> -----Original Message----- > From: Philippe Mathieu-Daudé <phi...@redhat.com> > Sent: Friday, October 30, 2020 3:16 AM > To: Dmitry Fomichev <dmitry.fomic...@wdc.com>; Keith Busch > <kbu...@kernel.org>; Klaus Jensen <k.jen...@samsung.com>; Kevin Wolf > <kw...@redhat.com>; Maxim Levitsky <mlevi...@redhat.com>; Fam Zheng > <f...@euphon.net> > Cc: Alistair Francis <alistair.fran...@wdc.com>; Matias Bjorling > <matias.bjorl...@wdc.com>; Niklas Cassel <niklas.cas...@wdc.com>; > Damien Le Moal <damien.lem...@wdc.com>; qemu-bl...@nongnu.org; > qemu-devel@nongnu.org > Subject: Re: [PATCH v8 07/11] hw/block/nvme: Support Zoned Namespace > Command Set > > Hi Dmitry, > > On 10/30/20 3:32 AM, Dmitry Fomichev wrote: > > The emulation code has been changed to advertise NVM Command Set > when > > "zoned" device property is not set (default) and Zoned Namespace > > Command Set otherwise. > > > > Define values and structures that are needed to support Zoned > > Namespace Command Set (NVMe TP 4053) in PCI NVMe controller > emulator. > > Define trace events where needed in newly introduced code. > > > > In order to improve scalability, all open, closed and full zones > > are organized in separate linked lists. Consequently, almost all > > zone operations don't require scanning of the entire zone array > > (which potentially can be quite large) - it is only necessary to > > enumerate one or more zone lists. > > > > Handlers for three new NVMe commands introduced in Zoned Namespace > > Command Set specification are added, namely for Zone Management > > Receive, Zone Management Send and Zone Append. > > > > Device initialization code has been extended to create a proper > > configuration for zoned operation using device properties. > > > > Read/Write command handler is modified to only allow writes at the > > write pointer if the namespace is zoned. For Zone Append command, > > writes implicitly happen at the write pointer and the starting write > > pointer value is returned as the result of the command. Write Zeroes > > handler is modified to add zoned checks that are identical to those > > done as a part of Write flow. > > > > Subsequent commits in this series add ZDE support and checks for > > active and open zone limits. > > > > Signed-off-by: Niklas Cassel <niklas.cas...@wdc.com> > > Signed-off-by: Hans Holmberg <hans.holmb...@wdc.com> > > Signed-off-by: Ajay Joshi <ajay.jo...@wdc.com> > > Signed-off-by: Chaitanya Kulkarni <chaitanya.kulka...@wdc.com> > > Signed-off-by: Matias Bjorling <matias.bjorl...@wdc.com> > > Signed-off-by: Aravind Ramesh <aravind.ram...@wdc.com> > > Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawas...@wdc.com> > > Signed-off-by: Adam Manzanares <adam.manzana...@wdc.com> > > Signed-off-by: Dmitry Fomichev <dmitry.fomic...@wdc.com> > > Reviewed-by: Niklas Cassel <niklas.cas...@wdc.com> > > --- > > block/nvme.c | 2 +- > > hw/block/nvme-ns.c | 173 ++++++++ > > hw/block/nvme-ns.h | 54 +++ > > hw/block/nvme.c | 977 > +++++++++++++++++++++++++++++++++++++++++- > > hw/block/nvme.h | 8 + > > hw/block/trace-events | 18 +- > > include/block/nvme.h | 113 ++++- > > When you start modifying include/ files, it is recommended > to start using scripts/git.orderfile as this makes review > easier (no need to scroll back / up constantly). >
Hi Philippe, Thanks for the suggestion, indeed it makes browsing through diffs much easier! > As "block/nvme.h" is shared by 2 subsystems, keeping its > changes in a separate patch is preferred. > No problem, I'll move all changes to include/block/nvme.h to a separate patch. > > 7 files changed, 1322 insertions(+), 23 deletions(-) > > > > diff --git a/block/nvme.c b/block/nvme.c > > index 05485fdd11..7a513c9a17 100644 > > --- a/block/nvme.c > > +++ b/block/nvme.c > > @@ -333,7 +333,7 @@ static inline int nvme_translate_error(const > NvmeCqe *c) > > { > > uint16_t status = (le16_to_cpu(c->status) >> 1) & 0xFF; > > if (status) { > > - trace_nvme_error(le32_to_cpu(c->result), > > + trace_nvme_error(le32_to_cpu(c->result32), > > le16_to_cpu(c->sq_head), > > le16_to_cpu(c->sq_id), > > le16_to_cpu(c->cid), > ... > > > diff --git a/include/block/nvme.h b/include/block/nvme.h > > index 3653b4aefc..ba8a45edf5 100644 > > --- a/include/block/nvme.h > > +++ b/include/block/nvme.h > > @@ -489,6 +489,9 @@ enum NvmeIoCommands { > > NVME_CMD_COMPARE = 0x05, > > NVME_CMD_WRITE_ZEROES = 0x08, > > NVME_CMD_DSM = 0x09, > > + NVME_CMD_ZONE_MGMT_SEND = 0x79, > > + NVME_CMD_ZONE_MGMT_RECV = 0x7a, > > + NVME_CMD_ZONE_APPEND = 0x7d, > > }; > > > > typedef struct QEMU_PACKED NvmeDeleteQ { > > @@ -649,8 +652,10 @@ typedef struct QEMU_PACKED NvmeAerResult { > > } NvmeAerResult; > > > > typedef struct QEMU_PACKED NvmeCqe { > > - uint32_t result; > > - uint32_t rsvd; > > + union { > > + uint64_t result64; > > + uint32_t result32; > > + }; > > When using packed structure you want to define all fields to > avoid alignment confusion (and I'm surprised the compiler doesn't > complain...). So this would be: > > union { > uint64_t result64; > struct { > uint32_t result32; > uint32_t rsvd32; > }; > }; > IMO, the compiler doesn't complain because it's a union. Smaller variants in unions are "padded" to the size of the largest variant regardless of whether the struct is packed or not. > But since the ZNS is still a technical proposal and not in the spec, > this doesn't look correct (the spec list this field as 32-bit). > > What do you think about adding NvmeCqeZNS? > > Maybe: > > typedef struct QEMU_PACKED NvmeCqeZNS { > uint64_t result; > uint16_t sq_head; > uint16_t sq_id; > uint16_t cid; > uint16_t status; > } NvmeCqeZNS; > > Or clever: > > typedef union QEMU_PACKED NvmeCqeZNS { > union { > struct { > uint64_t result; > uint32_t dw2; > uint32_t dw3; > }; > NvmeCqe cqe; > }; > } NvmeCqeZNS; > The 1.4 base spec changes Reserved DW1 in CQE to become the Command Specific DW1, so it would rather make sense to define a command-specific CQE for Zone Append - In include/block/nvme.h: typedef struct QEMU_PACKED NvmeCqe { uint32_t result; - uint32_t rsvd; + uint32_t dw1; uint16_t sq_head; uint16_t sq_id; uint16_t cid; uint16_t status; } NvmeCqe; +/* Zone Append - specific CQE */ +typedef struct QEMU_PACKED NvmeCqeZA { + uint64_t za_slba; + uint16_t sq_head; + uint16_t sq_id; + uint16_t cid; + uint16_t status; +} NvmeCqeZA; ... + QEMU_BUILD_BUG_ON(sizeof(NvmeCqe) != sizeof(NvmeCqeZA)); This will go away with all CQE unions and it will also allow the returned SLBA value to be properly named. What do you think? > I wonder what part could go in hw/block/nvme-ns.h or "block/nvme-zns.h". NvmeCqeZA could simply be defined in include/block/nvme.h next to NvmeCqe. The problem with adding include/block/nvme-zns.h is that it would be hard if not impossible to separate all ZNS-specific content from block/nvme.h and it would become necessary for developers to deal with two files that present different parts of ZNS definitions instead of just one. Best regards, Dmitry > > > uint16_t sq_head; > > uint16_t sq_id; > > uint16_t cid; > > @@ -678,6 +683,7 @@ enum NvmeStatusCodes { > > NVME_SGL_DESCR_TYPE_INVALID = 0x0011, > > NVME_INVALID_USE_OF_CMB = 0x0012, > > NVME_CMD_SET_CMB_REJECTED = 0x002b, > > + NVME_INVALID_CMD_SET = 0x002c, > > NVME_LBA_RANGE = 0x0080, > > NVME_CAP_EXCEEDED = 0x0081, > > NVME_NS_NOT_READY = 0x0082, > > @@ -702,6 +708,14 @@ enum NvmeStatusCodes { > > NVME_CONFLICTING_ATTRS = 0x0180, > > NVME_INVALID_PROT_INFO = 0x0181, > > NVME_WRITE_TO_RO = 0x0182, > > + NVME_ZONE_BOUNDARY_ERROR = 0x01b8, > > + NVME_ZONE_FULL = 0x01b9, > > + NVME_ZONE_READ_ONLY = 0x01ba, > > + NVME_ZONE_OFFLINE = 0x01bb, > > + NVME_ZONE_INVALID_WRITE = 0x01bc, > > + NVME_ZONE_TOO_MANY_ACTIVE = 0x01bd, > > + NVME_ZONE_TOO_MANY_OPEN = 0x01be, > > + NVME_ZONE_INVAL_TRANSITION = 0x01bf, > > NVME_WRITE_FAULT = 0x0280, > > NVME_UNRECOVERED_READ = 0x0281, > > NVME_E2E_GUARD_ERROR = 0x0282, > > @@ -886,6 +900,11 @@ typedef struct QEMU_PACKED NvmeIdCtrl { > > uint8_t vs[1024]; > > } NvmeIdCtrl; > > > > +typedef struct NvmeIdCtrlZoned { > > + uint8_t zasl; > > + uint8_t rsvd1[4095]; > > +} NvmeIdCtrlZoned; > ...