On 10/3/24 6:26 AM, Andrew Jones wrote:
On Tue, Oct 01, 2024 at 10:02:58PM GMT, Daniel Henrique Barboza wrote:
...
+/*
+ * RISCV IOMMU Address Translation Lookup - Page Table Walk
+ *
+ * Note: Code is based on get_physical_address() from target/riscv/cpu_helper.c
+ * Both implementation can be merged into single helper function in future.
+ * Keeping them separate for now, as error reporting and flow specifics are
+ * sufficiently different for separate implementation.
+ *
+ * @s        : IOMMU Device State
+ * @ctx      : Translation context for device id and process address space id.
+ * @iotlb    : translation data: physical address and access mode.
+ * @return   : success or fault cause code.
+ */
+static int riscv_iommu_spa_fetch(RISCVIOMMUState *s, RISCVIOMMUContext *ctx,
+    IOMMUTLBEntry *iotlb)
+{
+    dma_addr_t addr, base;
+    uint64_t satp, gatp, pte;
+    bool en_s, en_g;
+    struct {
+        unsigned char step;
+        unsigned char levels;
+        unsigned char ptidxbits;
+        unsigned char ptesize;
+    } sc[2];
+    /* Translation stage phase */
+    enum {
+        S_STAGE = 0,
+        G_STAGE = 1,
+    } pass;
+    MemTxResult ret;
+
+    satp = get_field(ctx->satp, RISCV_IOMMU_ATP_MODE_FIELD);
+    gatp = get_field(ctx->gatp, RISCV_IOMMU_ATP_MODE_FIELD);
+
+    en_s = satp != RISCV_IOMMU_DC_FSC_MODE_BARE;
+    en_g = gatp != RISCV_IOMMU_DC_IOHGATP_MODE_BARE;
+
+    /*
+     * Early check for MSI address match when IOVA == GPA. This check
+     * is required to ensure MSI translation is applied in case
+     * first-stage translation is set to BARE mode. In this case IOVA
+     * provided is a valid GPA. Running translation through page walk
+     * second stage translation will incorrectly try to translate GPA
+     * to host physical page, likely hitting IOPF.
+     */

Why was this comment expanded from the simple

"Early check for MSI address match when IOVA == GPA."

The comment is now incorrect since the check is required even when
first-stage translation is not BARE. I just skimmed the spec again trying
to figure out if the removal of '!en_s' is a hack or a fix, and I'm
inclined to say "fix", but it's an incomplete fix. I found a sentence that
says

"If the virtual memory scheme selected for first-stage is Bare but the
scheme for the second-stage is not Bare then the IOVA is a GPA."

which, in a way, defines a GPA to only be a GPA when second-stage is used
(and all MSI translation specifications refer to GPAs). However, maybe I
missed it, but I couldn't find any actual reason that the MSI table can't
be used when first-stage is not BARE and second-stage is (and, of course,
it makes no difference for single-stage translations to call IOVAs GPAs
or not).

Now, I also see

"If the virtual memory scheme selected for neither stage is Bare then the
IOVA is a VA. Two-stage address translation is in effect. The first-stage
translates the VA to a GPA and the second-stage translates the GPA to a
SPA."

in the spec, which means we should probably change the removal of '!en_s'
to '!(en_s && en_g)'. VFIO+irqbypass would still work with that and, when
Linux learns to support two-stage translation, we wouldn't incorrectly try
to check for a GVA in the MSI table.

Ok. It seems to me that we can't rely on the riscv-iommu spec alone to let
us know how to detect if IOVA == GPA, given that one of the main usages
we have ATM (VFIO irqbypass) will use GPAs with S-stage enabled.

(Note: shouldn't we open a bug against the riscv-iommu spec?)

I like the idea of ruling out the case where IOVA == VA since that is clear
in the spec. We also know that MSI entries will always contains GPAs.

This is the change I'm going to make in v9:


    /*
     * Early check for MSI address match when IOVA == GPA to
     * support VFIO+irqbypass. The riscv-iommu spec doesn't
     * consider the case where a GPA can be produced by S-stage
     * only, but we have real life examples like Linux VFIO that
     * work that way. The spec alone does not provide a reliable
     * way of detecting if IOVA == GPA.
     *
     * The spec is clear about what is a VA: "If the virtual
     * memory scheme selected for neither stage is Bare then
     * the IOVA is a VA", in our case "(en_s && en_g)". We also
     * know that MSI tables will always hold GPAs.
     *
     * Thus the check consists of ruling out VAs and checking
     * the MSI table.
     */
    if (!(en_s && en_g) && (iotlb->perm & IOMMU_WO) &&
        riscv_iommu_msi_check(s, ctx, iotlb->iova)) {
        iotlb->target_as = &s->trap_as;
        iotlb->translated_addr = iotlb->iova;
        iotlb->addr_mask = ~TARGET_PAGE_MASK;
        return 0;
    }

Tomasz, let me know if you have any opinions against it. I intend to send
the v9 start of next week.


Thanks,


Daniel



Thanks,
drew

Reply via email to