On Fri, Jan 21, 2022 at 03:28:33PM +0100, Alexander Graf wrote:
>
> On 19.01.22 19:45, Kevin O'Connor wrote:
> > Commit 01f2736cc905d ("nvme: Pass large I/O requests as PRP lists")
> > introduced multi-page requests using the NVMe PRP mechanism. To store the
> > list and "first page to write to" hints, it added fields to the NVMe
> > namespace struct.
> >
> > Unfortunately, that struct resides in fseg which is read-only at runtime.
> > While KVM ignores the read-only part and allows writes, real hardware and
> > TCG adhere to the semantics and ignore writes to the fseg region. The net
> > effect of that is that reads and writes were always happening on address 0,
> > unless they went through the bounce buffer logic.
> >
> > This patch builds the PRP maintenance data in the existing "dma bounce
> > buffer" and only builds it when needed.
> >
> > Fixes: 01f2736cc905d ("nvme: Pass large I/O requests as PRP lists")
> > Reported-by: Matt DeVillier <[email protected]>
> > Signed-off-by: Alexander Graf <[email protected]>
> > Signed-off-by: Kevin O'Connor <[email protected]>
> > ---
> > src/hw/nvme-int.h | 6 ------
> > src/hw/nvme.c | 51 +++++++++++++++++------------------------------
> > 2 files changed, 18 insertions(+), 39 deletions(-)
> >
> > diff --git a/src/hw/nvme-int.h b/src/hw/nvme-int.h
> > index 9564c17..f0d717d 100644
> > --- a/src/hw/nvme-int.h
> > +++ b/src/hw/nvme-int.h
> > @@ -10,8 +10,6 @@
> > #include "types.h" // u32
> > #include "pcidevice.h" // struct pci_device
> >
> > -#define NVME_MAX_PRPL_ENTRIES 15 /* Allows requests up to 64kb */
> > -
> > /* Data structures */
> >
> > /* The register file of a NVMe host controller. This struct follows the
> > naming
> > @@ -122,10 +120,6 @@ struct nvme_namespace {
> >
> > /* Page aligned buffer of size NVME_PAGE_SIZE. */
> > char *dma_buffer;
> > -
> > - /* Page List */
> > - u32 prpl_len;
> > - u64 prpl[NVME_MAX_PRPL_ENTRIES];
> > };
> >
> > /* Data structures for NVMe admin identify commands */
> > diff --git a/src/hw/nvme.c b/src/hw/nvme.c
> > index 3a73784..39b9138 100644
> > --- a/src/hw/nvme.c
> > +++ b/src/hw/nvme.c
> > @@ -470,35 +470,19 @@ nvme_bounce_xfer(struct nvme_namespace *ns, u64 lba,
> > void *buf, u16 count,
> > return res;
> > }
> >
> > -static void nvme_reset_prpl(struct nvme_namespace *ns)
> > -{
> > - ns->prpl_len = 0;
> > -}
> > -
> > -static int nvme_add_prpl(struct nvme_namespace *ns, u64 base)
> > -{
> > - if (ns->prpl_len >= NVME_MAX_PRPL_ENTRIES)
> > - return -1;
> > -
> > - ns->prpl[ns->prpl_len++] = base;
> > -
> > - return 0;
> > -}
> > +#define NVME_MAX_PRPL_ENTRIES 15 /* Allows requests up to 64kb */
> >
> > // Transfer data using page list (if applicable)
> > static int
> > nvme_prpl_xfer(struct nvme_namespace *ns, u64 lba, void *buf, u16 count,
> > int write)
> > {
> > - int first_page = 1;
> > u32 base = (long)buf;
> > s32 size;
> >
> > if (count > ns->max_req_size)
> > count = ns->max_req_size;
> >
> > - nvme_reset_prpl(ns);
> > -
> > size = count * ns->block_size;
> > /* Special case for transfers that fit into PRP1, but are unaligned */
> > if (((size + (base & ~NVME_PAGE_MASK)) <= NVME_PAGE_SIZE))
> > @@ -512,28 +496,29 @@ nvme_prpl_xfer(struct nvme_namespace *ns, u64 lba,
> > void *buf, u16 count,
> > if (size & (ns->block_size - 1ULL))
> > return nvme_bounce_xfer(ns, lba, buf, count, write);
> >
> > - for (; size > 0; base += NVME_PAGE_SIZE, size -= NVME_PAGE_SIZE) {
> > - if (first_page) {
> > - /* First page is special */
> > - first_page = 0;
> > - continue;
> > - }
> > - if (nvme_add_prpl(ns, base))
> > - return nvme_bounce_xfer(ns, lba, buf, count, write);
> > - }
> > -
> > - void *prp2;
> > if ((ns->block_size * count) > (NVME_PAGE_SIZE * 2)) {
> > - /* We need to describe more than 2 pages, rely on PRP List */
> > - prp2 = ns->prpl;
> > + /* We need to describe more than 2 pages, build PRP List */
> > + u32 prpl_len = 0;
> > + u64 *prpl = (void*)ns->dma_buffer;
>
>
> At 15*8=120 bytes of data, do we even need to put the prpl into dma_buffer
> or can we just use the stack? Will the array be guaranteed 64bit aligned or
> would we need to add an attribute to it?
>
> u64 prpl[NVME_MAX_PRPL_ENTRIES];
>
> Either way, I don't have strong feelings one way or another. It just seems
> more natural to keep the bounce buffer purely as bounce buffer :).
In general it's possible to DMA from the stack (eg,
src/hw/ohci.c:ohci_send_pipe() ). However, SeaBIOS doesn't guarantee
stack alignment so it would need to be done manually. Also, I'm not
sure how tight the nvme request completion code is - if it returns
early for some reason it could cause havoc (device dma into random
memory).
Another option might be to make a single global prpl array, but that
would consume the memory even if nvme isn't in use.
FWIW though, I don't see a harm in the single ( u64 *prpl =
(void*)ns->dma_buffer ) line.
Thanks,
-Kevin
>
>
> Alex
>
>
> > + int first_page = 1;
> > + for (; size > 0; base += NVME_PAGE_SIZE, size -= NVME_PAGE_SIZE) {
> > + if (first_page) {
> > + /* First page is special */
> > + first_page = 0;
> > + continue;
> > + }
> > + if (prpl_len >= NVME_MAX_PRPL_ENTRIES)
> > + return nvme_bounce_xfer(ns, lba, buf, count, write);
> > + prpl[prpl_len++] = base;
> > + }
> > + return nvme_io_xfer(ns, lba, buf, prpl, count, write);
> > } else if ((ns->block_size * count) > NVME_PAGE_SIZE) {
> > /* Directly embed the 2nd page if we only need 2 pages */
> > - prp2 = (void *)(long)ns->prpl[0];
> > + return nvme_io_xfer(ns, lba, buf, buf + NVME_PAGE_SIZE, count,
> > write);
> > } else {
> > /* One page is enough, don't expose anything else */
> > - prp2 = NULL;
> > + return nvme_io_xfer(ns, lba, buf, NULL, count, write);
> > }
> > - return nvme_io_xfer(ns, lba, buf, prp2, count, write);
> > }
> >
> > static int
> > --
> > 2.31.1
> >
>
>
>
> Amazon Development Center Germany GmbH
> Krausenstr. 38
> 10117 Berlin
> Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
> Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
> Sitz: Berlin
> Ust-ID: DE 289 237 879
>
>
_______________________________________________
SeaBIOS mailing list -- [email protected]
To unsubscribe send an email to [email protected]