On Thu, Oct 03, 2024 at 11:36:00AM GMT, Tomasz Jeznach wrote: > On Thu, Oct 3, 2024 at 6:06 AM Daniel Henrique Barboza > <dbarb...@ventanamicro.com> wrote: > > > > > > > > On 10/3/24 6:26 AM, Andrew Jones wrote: > > > On Tue, Oct 01, 2024 at 10:02:58PM GMT, Daniel Henrique Barboza wrote: > > > ... > > >> +/* > > >> + * RISCV IOMMU Address Translation Lookup - Page Table Walk > > >> + * > > >> + * Note: Code is based on get_physical_address() from > > >> target/riscv/cpu_helper.c > > >> + * Both implementation can be merged into single helper function in > > >> future. > > >> + * Keeping them separate for now, as error reporting and flow specifics > > >> are > > >> + * sufficiently different for separate implementation. > > >> + * > > >> + * @s : IOMMU Device State > > >> + * @ctx : Translation context for device id and process address > > >> space id. > > >> + * @iotlb : translation data: physical address and access mode. > > >> + * @return : success or fault cause code. > > >> + */ > > >> +static int riscv_iommu_spa_fetch(RISCVIOMMUState *s, RISCVIOMMUContext > > >> *ctx, > > >> + IOMMUTLBEntry *iotlb) > > >> +{ > > >> + dma_addr_t addr, base; > > >> + uint64_t satp, gatp, pte; > > >> + bool en_s, en_g; > > >> + struct { > > >> + unsigned char step; > > >> + unsigned char levels; > > >> + unsigned char ptidxbits; > > >> + unsigned char ptesize; > > >> + } sc[2]; > > >> + /* Translation stage phase */ > > >> + enum { > > >> + S_STAGE = 0, > > >> + G_STAGE = 1, > > >> + } pass; > > >> + MemTxResult ret; > > >> + > > >> + satp = get_field(ctx->satp, RISCV_IOMMU_ATP_MODE_FIELD); > > >> + gatp = get_field(ctx->gatp, RISCV_IOMMU_ATP_MODE_FIELD); > > >> + > > >> + en_s = satp != RISCV_IOMMU_DC_FSC_MODE_BARE; > > >> + en_g = gatp != RISCV_IOMMU_DC_IOHGATP_MODE_BARE; > > >> + > > >> + /* > > >> + * Early check for MSI address match when IOVA == GPA. This check > > >> + * is required to ensure MSI translation is applied in case > > >> + * first-stage translation is set to BARE mode. In this case IOVA > > >> + * provided is a valid GPA. Running translation through page walk > > >> + * second stage translation will incorrectly try to translate GPA > > >> + * to host physical page, likely hitting IOPF. > > >> + */ > > > > > > Why was this comment expanded from the simple > > > > > > "Early check for MSI address match when IOVA == GPA." > > > > > > The comment is now incorrect since the check is required even when > > > first-stage translation is not BARE. I just skimmed the spec again trying > > > to figure out if the removal of '!en_s' is a hack or a fix, and I'm > > > inclined to say "fix", but it's an incomplete fix. I found a sentence that > > > says > > > > > > "If the virtual memory scheme selected for first-stage is Bare but the > > > scheme for the second-stage is not Bare then the IOVA is a GPA." > > > > > > which, in a way, defines a GPA to only be a GPA when second-stage is used > > > (and all MSI translation specifications refer to GPAs). However, maybe I > > > missed it, but I couldn't find any actual reason that the MSI table can't > > > be used when first-stage is not BARE and second-stage is (and, of course, > > > it makes no difference for single-stage translations to call IOVAs GPAs > > > or not). > > > > > > Now, I also see > > > > > > "If the virtual memory scheme selected for neither stage is Bare then the > > > IOVA is a VA. Two-stage address translation is in effect. The first-stage > > > translates the VA to a GPA and the second-stage translates the GPA to a > > > SPA." > > > > > > in the spec, which means we should probably change the removal of '!en_s' > > > to '!(en_s && en_g)'. VFIO+irqbypass would still work with that and, when > > > Linux learns to support two-stage translation, we wouldn't incorrectly try > > > to check for a GVA in the MSI table. > > > > Ok. It seems to me that we can't rely on the riscv-iommu spec alone to let > > us know how to detect if IOVA == GPA, given that one of the main usages > > we have ATM (VFIO irqbypass) will use GPAs with S-stage enabled. > > > > (Note: shouldn't we open a bug against the riscv-iommu spec?) > > > > I like the idea of ruling out the case where IOVA == VA since that is clear > > in the spec. We also know that MSI entries will always contains GPAs. > > > > This is the change I'm going to make in v9: > > > > > > It might be worth it to quote spec definitions of IOVA, GPA, SPA. > > IOVA: I/O Virtual Address: Virtual address for DMA by devices > GPA: Guest Physical Address: An address in the virtualized physical > memory space of a virtual machine. > SPA: Supervisor Physical Address: Physical address used to access > memory and memory-mapped resources. > > From IOMMU specification it's pretty clear, an untranslated address is > always an IOVA. > > If you look at the introduction section and section 2.3. "Process to > translate an IOVA", translation (if enabled by the device/process > context) to GPA, SPA is expressed as: > > step #17: GPA = FSC[IOVA] > step #19: SPA = IOHGATP[GPA] > > With that, MSI address translations always use GPA as an input > (section 2.3.3. "Process to translate addresses of MSIs"), and is done > as step #18 of the IOVA translation. > > And now: > If FSC.Mode == Bare, step #17 translation is simple GPA = IOVA > If IOHGATP.Mode == Bare, step #19 translation is simple SPA = GPA > > If both FSC and IOHGATP modes are bare, no address translation is > performed, and SPA = GPA = IOVA. However if "MSI page tables are > enabled, then the MSI address translation process specified in Section > 2.3.3 is invoked". This was the reason to put the > riscv_iommu_msi_check() function before check for pass-through mode > check. > > If FSC.Mode == Bare and untranslated address given to the function > riscv_iommu_spa_fetch() is an GPA = IOVA. It could also be SPA if > IOHGATP.Mode == Bare, but it is irrelevant for the MSI address > translation. That is the reason to put only the check for !en_s in > before riscv_iommu_msi_check() call. > > I'm aware that VFIO uses S-Stage translation only, but it does not > change the hardware logic. For such cases VFIO can provide IMSIC > register mappings in S-Stage PT to emulate MSI doorbell registers (if > needed). When S-Stage translation is used, an untranslated address > processed by the IOMMU hardware is still an IOVA, even if the guest OS > / VFIO user treats it as PA.
This would work for guest interrupt file mappings, but MRIFs won't work and the note of the "Device contexts at an IOMMU" section of the AIA spec points out that VCPU migration may become more difficult to manage. > > Modification of the check to include en_g ( !(en_s && en_g) && ,,, ) > will incorrectly enable MSI address translation for an IOVA address > (as defined in IOMMU spec) if G-Stage translation is set to Bare mode. > > I'd suggest to keep the s/g stage checks here as: > if ((iotlb->perm & IOMMU_WO) && !en_s && > riscv_iommu_msi_check(s, ctx, iotlb->iova)) { > ... > } > > Hope that helps this discussion. Yes, it certainly helps, thank you. It points out that step 17 of "Process to translate an IOVA" produces a GPA which is consumed by step 18 for the MSI translation (and there's no other way to get to step 18). OIOW, we must do a first-stage translation before we check the MSI page table, which means we can't do an early MSI check unless we have strictly !en_s. This is unfortunate, considering the current state of VFIO+irqbypass and the IOMMU driver. I also wonder if it's really necessary. It seems to me that the MSI page table should always be checked first when either S-stage or G-stage is Bare (so there's only a single-stage of translation) and msiptp.MODE is not Off. I can't see what harm could come from that and it would give system software the ability to leverage code that already knows how to manage single-stage IOMMUs for device assignment. Thanks, drew