On Fri, Sep 24, 2021 at 11:42:13AM +0200, Jan Beulich wrote:
> In order to be able to insert/remove super-pages we need to allow
> callers of the walking function to specify at which point to stop the
> walk.
> 
> For intel_iommu_lookup_page() integrate the last level access into
> the main walking function.
> 
> dma_pte_clear_one() gets only partly adjusted for now: Error handling
> and order parameter get put in place, but the order parameter remains
> ignored (just like intel_iommu_map_page()'s order part of the flags).
> 
> Signed-off-by: Jan Beulich <jbeul...@suse.com>
> ---
> I have to admit that I don't understand why domain_pgd_maddr() wants to
> populate all page table levels for DFN 0.

I think it would be enough to create up to the level requested by the
caller?

Seems like a lazy way to always assert that the level requested by the
caller would be present.

> 
> I was actually wondering whether it wouldn't make sense to integrate
> dma_pte_clear_one() into its only caller intel_iommu_unmap_page(), for
> better symmetry with intel_iommu_map_page().
> ---
> v2: Fix build.
> 
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -264,63 +264,116 @@ static u64 bus_to_context_maddr(struct v
>      return maddr;
>  }
>  
> -static u64 addr_to_dma_page_maddr(struct domain *domain, u64 addr, int alloc)
> +/*
> + * This function walks (and if requested allocates) page tables to the
> + * designated target level. It returns
> + * - 0 when a non-present entry was encountered and no allocation was
> + *   requested,
> + * - a small positive value (the level, i.e. below PAGE_SIZE) upon allocation
> + *   failure,
> + * - for target > 0 the address of the page table holding the leaf PTE for
                          ^ physical

I think it's clearer, as the return type could be ambiguous.

> + *   the requested address,
> + * - for target == 0 the full PTE.

Could this create confusion if for example one PTE maps physical page
0? As the caller when getting back a full PTE with address 0 and some of
the low bits set could interpret the result as an error.

I think we already had this discussion on other occasions, but I would
rather add a parameter to be used as a return placeholder (ie: a
*dma_pte maybe?) and use the function return value just for errors
because IMO it's clearer, but I know you don't usually like this
approach, so I'm not going to insist.

> + */
> +static uint64_t addr_to_dma_page_maddr(struct domain *domain, daddr_t addr,
> +                                       unsigned int target,
> +                                       unsigned int *flush_flags, bool alloc)
>  {
>      struct domain_iommu *hd = dom_iommu(domain);
>      int addr_width = agaw_to_width(hd->arch.vtd.agaw);
>      struct dma_pte *parent, *pte = NULL;
> -    int level = agaw_to_level(hd->arch.vtd.agaw);
> -    int offset;
> +    unsigned int level = agaw_to_level(hd->arch.vtd.agaw), offset;
>      u64 pte_maddr = 0;
>  
>      addr &= (((u64)1) << addr_width) - 1;
>      ASSERT(spin_is_locked(&hd->arch.mapping_lock));
> +    ASSERT(target || !alloc);

Might be better to use an if with ASSERT_UNREACHABLE and return an
error? (ie: level itself?)

> +
>      if ( !hd->arch.vtd.pgd_maddr )
>      {
>          struct page_info *pg;
>  
> -        if ( !alloc || !(pg = iommu_alloc_pgtable(domain)) )
> +        if ( !alloc )
> +            goto out;
> +
> +        pte_maddr = level;
> +        if ( !(pg = iommu_alloc_pgtable(domain)) )
>              goto out;
>  
>          hd->arch.vtd.pgd_maddr = page_to_maddr(pg);
>      }
>  
> -    parent = (struct dma_pte *)map_vtd_domain_page(hd->arch.vtd.pgd_maddr);
> -    while ( level > 1 )
> +    pte_maddr = hd->arch.vtd.pgd_maddr;
> +    parent = map_vtd_domain_page(pte_maddr);
> +    while ( level > target )
>      {
>          offset = address_level_offset(addr, level);
>          pte = &parent[offset];
>  
>          pte_maddr = dma_pte_addr(*pte);
> -        if ( !pte_maddr )
> +        if ( !dma_pte_present(*pte) || (level > 1 && 
> dma_pte_superpage(*pte)) )
>          {
>              struct page_info *pg;
> +            /*
> +             * Higher level tables always set r/w, last level page table
> +             * controls read/write.
> +             */
> +            struct dma_pte new_pte = { DMA_PTE_PROT };
>  
>              if ( !alloc )
> -                break;
> +            {
> +                pte_maddr = 0;
> +                if ( !dma_pte_present(*pte) )
> +                    break;
> +
> +                /*
> +                 * When the leaf entry was requested, pass back the full PTE,
> +                 * with the address adjusted to account for the residual of
> +                 * the walk.
> +                 */
> +                pte_maddr = pte->val +

Wouldn't it be better to use dma_pte_addr(*pte) rather than accessing
pte->val, and then you could drop the PAGE_MASK?

Or is the addr parameter not guaranteed to be page aligned?

> +                    (addr & ((1UL << level_to_offset_bits(level)) - 1) &
> +                     PAGE_MASK);
> +                if ( !target )
> +                    break;

I'm confused by the conditional break here, why would you calculate
pte_maddr unconditionally to get overwritten just the line below if
target != 0?

Thanks, Roger.

Reply via email to