On Thu, Oct 3, 2024 at 6:06 AM Daniel Henrique Barboza
<dbarb...@ventanamicro.com> wrote:
>
>
>
> On 10/3/24 6:26 AM, Andrew Jones wrote:
> > On Tue, Oct 01, 2024 at 10:02:58PM GMT, Daniel Henrique Barboza wrote:
> > ...
> >> +/*
> >> + * RISCV IOMMU Address Translation Lookup - Page Table Walk
> >> + *
> >> + * Note: Code is based on get_physical_address() from 
> >> target/riscv/cpu_helper.c
> >> + * Both implementation can be merged into single helper function in 
> >> future.
> >> + * Keeping them separate for now, as error reporting and flow specifics 
> >> are
> >> + * sufficiently different for separate implementation.
> >> + *
> >> + * @s        : IOMMU Device State
> >> + * @ctx      : Translation context for device id and process address 
> >> space id.
> >> + * @iotlb    : translation data: physical address and access mode.
> >> + * @return   : success or fault cause code.
> >> + */
> >> +static int riscv_iommu_spa_fetch(RISCVIOMMUState *s, RISCVIOMMUContext 
> >> *ctx,
> >> +    IOMMUTLBEntry *iotlb)
> >> +{
> >> +    dma_addr_t addr, base;
> >> +    uint64_t satp, gatp, pte;
> >> +    bool en_s, en_g;
> >> +    struct {
> >> +        unsigned char step;
> >> +        unsigned char levels;
> >> +        unsigned char ptidxbits;
> >> +        unsigned char ptesize;
> >> +    } sc[2];
> >> +    /* Translation stage phase */
> >> +    enum {
> >> +        S_STAGE = 0,
> >> +        G_STAGE = 1,
> >> +    } pass;
> >> +    MemTxResult ret;
> >> +
> >> +    satp = get_field(ctx->satp, RISCV_IOMMU_ATP_MODE_FIELD);
> >> +    gatp = get_field(ctx->gatp, RISCV_IOMMU_ATP_MODE_FIELD);
> >> +
> >> +    en_s = satp != RISCV_IOMMU_DC_FSC_MODE_BARE;
> >> +    en_g = gatp != RISCV_IOMMU_DC_IOHGATP_MODE_BARE;
> >> +
> >> +    /*
> >> +     * Early check for MSI address match when IOVA == GPA. This check
> >> +     * is required to ensure MSI translation is applied in case
> >> +     * first-stage translation is set to BARE mode. In this case IOVA
> >> +     * provided is a valid GPA. Running translation through page walk
> >> +     * second stage translation will incorrectly try to translate GPA
> >> +     * to host physical page, likely hitting IOPF.
> >> +     */
> >
> > Why was this comment expanded from the simple
> >
> > "Early check for MSI address match when IOVA == GPA."
> >
> > The comment is now incorrect since the check is required even when
> > first-stage translation is not BARE. I just skimmed the spec again trying
> > to figure out if the removal of '!en_s' is a hack or a fix, and I'm
> > inclined to say "fix", but it's an incomplete fix. I found a sentence that
> > says
> >
> > "If the virtual memory scheme selected for first-stage is Bare but the
> > scheme for the second-stage is not Bare then the IOVA is a GPA."
> >
> > which, in a way, defines a GPA to only be a GPA when second-stage is used
> > (and all MSI translation specifications refer to GPAs). However, maybe I
> > missed it, but I couldn't find any actual reason that the MSI table can't
> > be used when first-stage is not BARE and second-stage is (and, of course,
> > it makes no difference for single-stage translations to call IOVAs GPAs
> > or not).
> >
> > Now, I also see
> >
> > "If the virtual memory scheme selected for neither stage is Bare then the
> > IOVA is a VA. Two-stage address translation is in effect. The first-stage
> > translates the VA to a GPA and the second-stage translates the GPA to a
> > SPA."
> >
> > in the spec, which means we should probably change the removal of '!en_s'
> > to '!(en_s && en_g)'. VFIO+irqbypass would still work with that and, when
> > Linux learns to support two-stage translation, we wouldn't incorrectly try
> > to check for a GVA in the MSI table.
>
> Ok. It seems to me that we can't rely on the riscv-iommu spec alone to let
> us know how to detect if IOVA == GPA, given that one of the main usages
> we have ATM (VFIO irqbypass) will use GPAs with S-stage enabled.
>
> (Note: shouldn't we open a bug against the riscv-iommu spec?)
>
> I like the idea of ruling out the case where IOVA == VA since that is clear
> in the spec. We also know that MSI entries will always contains GPAs.
>
> This is the change I'm going to make in v9:
>
>

It might be worth it to quote spec definitions of IOVA, GPA, SPA.

IOVA: I/O Virtual Address: Virtual address for DMA by devices
GPA: Guest Physical Address: An address in the virtualized physical
memory space of a virtual machine.
SPA: Supervisor Physical Address: Physical address used to access
memory and memory-mapped resources.

>From IOMMU specification it's pretty clear, an untranslated address is
always an IOVA.

If you look at the introduction section and section 2.3. "Process to
translate an IOVA", translation (if enabled by the device/process
context) to GPA, SPA is expressed as:

step #17: GPA = FSC[IOVA]
step #19: SPA = IOHGATP[GPA]

With that, MSI address translations always use GPA as an input
(section 2.3.3. "Process to translate addresses of MSIs"), and is done
as step #18 of the IOVA translation.

And now:
If FSC.Mode == Bare, step #17 translation is simple GPA = IOVA
If IOHGATP.Mode == Bare, step #19 translation is simple SPA = GPA

If both FSC and IOHGATP modes are bare, no address translation is
performed, and SPA = GPA = IOVA. However if "MSI page tables are
enabled, then the MSI address translation process specified in Section
2.3.3 is invoked".  This was the reason to put the
riscv_iommu_msi_check() function before check for pass-through mode
check.

If FSC.Mode == Bare and untranslated address given to the function
riscv_iommu_spa_fetch() is an GPA = IOVA. It could also be SPA if
IOHGATP.Mode == Bare, but it is irrelevant for the MSI address
translation.  That is the reason to put only the check for !en_s in
before riscv_iommu_msi_check() call.

I'm aware that VFIO uses S-Stage translation only, but it does not
change the hardware logic. For such cases VFIO can provide IMSIC
register mappings in S-Stage PT to emulate MSI doorbell registers (if
needed). When S-Stage translation is used, an untranslated address
processed by the IOMMU hardware is still an IOVA, even if the guest OS
/ VFIO user treats it as PA.

Modification of the check to include en_g ( !(en_s && en_g) && ,,, )
will incorrectly enable MSI address translation for an IOVA address
(as defined in IOMMU spec) if G-Stage translation is set to Bare mode.

I'd suggest to keep the s/g stage checks here as:
      if ((iotlb->perm & IOMMU_WO) && !en_s &&
          riscv_iommu_msi_check(s, ctx, iotlb->iova)) {
          ...
      }

Hope that helps this discussion.
Best,
- Tomasz

>      /*
>       * Early check for MSI address match when IOVA == GPA to
>       * support VFIO+irqbypass. The riscv-iommu spec doesn't
>       * consider the case where a GPA can be produced by S-stage
>       * only, but we have real life examples like Linux VFIO that
>       * work that way. The spec alone does not provide a reliable
>       * way of detecting if IOVA == GPA.
>       *
>       * The spec is clear about what is a VA: "If the virtual
>       * memory scheme selected for neither stage is Bare then
>       * the IOVA is a VA", in our case "(en_s && en_g)". We also
>       * know that MSI tables will always hold GPAs.
>       *
>       * Thus the check consists of ruling out VAs and checking
>       * the MSI table.
>       */
>      if (!(en_s && en_g) && (iotlb->perm & IOMMU_WO) &&
>          riscv_iommu_msi_check(s, ctx, iotlb->iova)) {
>          iotlb->target_as = &s->trap_as;
>          iotlb->translated_addr = iotlb->iova;
>          iotlb->addr_mask = ~TARGET_PAGE_MASK;
>          return 0;
>      }
>
> Tomasz, let me know if you have any opinions against it. I intend to send
> the v9 start of next week.
>
>
> Thanks,
>
>
> Daniel
>
>
> >
> > Thanks,
> > drew

Reply via email to