On 10/3/24 6:26 AM, Andrew Jones wrote:
On Tue, Oct 01, 2024 at 10:02:58PM GMT, Daniel Henrique Barboza wrote:
...
+/*
+ * RISCV IOMMU Address Translation Lookup - Page Table Walk
+ *
+ * Note: Code is based on get_physical_address() from target/riscv/cpu_helper.c
+ * Both implementation can be merged into single helper function in future.
+ * Keeping them separate for now, as error reporting and flow specifics are
+ * sufficiently different for separate implementation.
+ *
+ * @s : IOMMU Device State
+ * @ctx : Translation context for device id and process address space id.
+ * @iotlb : translation data: physical address and access mode.
+ * @return : success or fault cause code.
+ */
+static int riscv_iommu_spa_fetch(RISCVIOMMUState *s, RISCVIOMMUContext *ctx,
+ IOMMUTLBEntry *iotlb)
+{
+ dma_addr_t addr, base;
+ uint64_t satp, gatp, pte;
+ bool en_s, en_g;
+ struct {
+ unsigned char step;
+ unsigned char levels;
+ unsigned char ptidxbits;
+ unsigned char ptesize;
+ } sc[2];
+ /* Translation stage phase */
+ enum {
+ S_STAGE = 0,
+ G_STAGE = 1,
+ } pass;
+ MemTxResult ret;
+
+ satp = get_field(ctx->satp, RISCV_IOMMU_ATP_MODE_FIELD);
+ gatp = get_field(ctx->gatp, RISCV_IOMMU_ATP_MODE_FIELD);
+
+ en_s = satp != RISCV_IOMMU_DC_FSC_MODE_BARE;
+ en_g = gatp != RISCV_IOMMU_DC_IOHGATP_MODE_BARE;
+
+ /*
+ * Early check for MSI address match when IOVA == GPA. This check
+ * is required to ensure MSI translation is applied in case
+ * first-stage translation is set to BARE mode. In this case IOVA
+ * provided is a valid GPA. Running translation through page walk
+ * second stage translation will incorrectly try to translate GPA
+ * to host physical page, likely hitting IOPF.
+ */
Why was this comment expanded from the simple
"Early check for MSI address match when IOVA == GPA."
The comment is now incorrect since the check is required even when
first-stage translation is not BARE. I just skimmed the spec again trying
to figure out if the removal of '!en_s' is a hack or a fix, and I'm
inclined to say "fix", but it's an incomplete fix. I found a sentence that
says
"If the virtual memory scheme selected for first-stage is Bare but the
scheme for the second-stage is not Bare then the IOVA is a GPA."
which, in a way, defines a GPA to only be a GPA when second-stage is used
(and all MSI translation specifications refer to GPAs). However, maybe I
missed it, but I couldn't find any actual reason that the MSI table can't
be used when first-stage is not BARE and second-stage is (and, of course,
it makes no difference for single-stage translations to call IOVAs GPAs
or not).
Now, I also see
"If the virtual memory scheme selected for neither stage is Bare then the
IOVA is a VA. Two-stage address translation is in effect. The first-stage
translates the VA to a GPA and the second-stage translates the GPA to a
SPA."
in the spec, which means we should probably change the removal of '!en_s'
to '!(en_s && en_g)'. VFIO+irqbypass would still work with that and, when
Linux learns to support two-stage translation, we wouldn't incorrectly try
to check for a GVA in the MSI table.
Ok. It seems to me that we can't rely on the riscv-iommu spec alone to let
us know how to detect if IOVA == GPA, given that one of the main usages
we have ATM (VFIO irqbypass) will use GPAs with S-stage enabled.
(Note: shouldn't we open a bug against the riscv-iommu spec?)
I like the idea of ruling out the case where IOVA == VA since that is clear
in the spec. We also know that MSI entries will always contains GPAs.
This is the change I'm going to make in v9:
/*
* Early check for MSI address match when IOVA == GPA to
* support VFIO+irqbypass. The riscv-iommu spec doesn't
* consider the case where a GPA can be produced by S-stage
* only, but we have real life examples like Linux VFIO that
* work that way. The spec alone does not provide a reliable
* way of detecting if IOVA == GPA.
*
* The spec is clear about what is a VA: "If the virtual
* memory scheme selected for neither stage is Bare then
* the IOVA is a VA", in our case "(en_s && en_g)". We also
* know that MSI tables will always hold GPAs.
*
* Thus the check consists of ruling out VAs and checking
* the MSI table.
*/
if (!(en_s && en_g) && (iotlb->perm & IOMMU_WO) &&
riscv_iommu_msi_check(s, ctx, iotlb->iova)) {
iotlb->target_as = &s->trap_as;
iotlb->translated_addr = iotlb->iova;
iotlb->addr_mask = ~TARGET_PAGE_MASK;
return 0;
}
Tomasz, let me know if you have any opinions against it. I intend to send
the v9 start of next week.
Thanks,
Daniel
Thanks,
drew