On Thu, Dec 17, 2020 at 10:02:19PM +0100, Klaus Jensen wrote: > From: Klaus Jensen <k.jen...@samsung.com> > > This series adds support for extended LBAs and end-to-end data > protection. Marked RFC, since there are a bunch of issues that could use > some discussion. > > Storing metadata bytes contiguously with the logical block data and > creating a physically extended logical block basically breaks the DULBE > and deallocation support I just added. Formatting a namespace with > protection information requires the app- and reftags of deallocated or > unwritten blocks to be 0xffff and 0xffffffff respectively; this could be > used to reintroduce DULBE support in that case, albeit at a somewhat > higher cost than the block status flag-based approach. > > There is basically three ways of storing metadata (and maybe a forth, > but that is probably quite the endeavour): > > 1. Storing metadata as extended blocks directly on the blockdev. That > is the approach used in this RFC. > > 2. Use a separate blockdev. Incidentially, this is also the easiest > and most straightforward solution to support MPTR-based "separate > metadata". This also allows DULBE and block deallocation to be > supported using the existing approach. > > 3. A hybrid of 1 and 2 where the metadata is stored contiguously at > the end of the nvme-ns blockdev. > > Option 1 obviously works well with DIF-based protection information and > extended LBAs since it maps one to one. Option 2 works flawlessly with > MPTR-based metadata, but extended LBAs can be "emulated" at the cost of > a bunch of scatter/gather operations.
Are there any actual users of extended metadata that we care about? I'm aware of only a few niche places that can even access an extended metadata format. There's not kernel support in any major OS that I know of. Option 2 sounds fine. If option 3 means that you're still using MPTR, but just sequester space at the end of the backing block device for meta-data purposes, then that is fine too. You can even resize it dynamically if you want to support different metadata sizes. > The 4th option is extending an existing image format (QCOW2) or create > something on top of RAW to supports metadata bytes per block. But both > approaches require full API support through the block layer. And > probably a lot of other stuff that I did not think about. It definitely sounds appealing to push the feature to a lower level if you're really willing to see that through. In any case, calculating T10 CRCs is *really* slow unless you have special hardware and software support for it.