On Mon, 14 Aug 2023 16:53:09 +0100
Steven Price <steven.pr...@arm.com> wrote:

> > +
> > +/**
> > + * struct panthor_vm_op_ctx - VM operation context
> > + *
> > + * With VM operations potentially taking place in a dma-signaling path, we
> > + * need to make sure everything that might require resource allocation is
> > + * pre-allocated upfront. This is what this operation context is far.
> > + *
> > + * We also collect resources that have been freed, so we can release them
> > + * asynchronously, and let the VM_BIND scheduler process the next VM_BIND
> > + * request.
> > + */
> > +struct panthor_vm_op_ctx {
> > +   /** @rsvd_page_tables: Pages reserved for the MMU page table update. */
> > +   struct {
> > +           /** @count: Number of pages reserved. */
> > +           u32 count;
> > +
> > +           /** @ptr: Point to the first unused page in the @pages table. */
> > +           u32 ptr;
> > +
> > +           /**
> > +            * @page: Array of pages that can be used for an MMU page table 
> > update.
> > +            *
> > +            * After an VM operation, there might be free pages left in 
> > this array.
> > +            * They should be returned to the pt_cache as part of the 
> > op_ctx cleanup.
> > +            */
> > +           void **pages;
> > +   } rsvd_page_tables;  
> 
> Two questions:
> 
> 1) Would a mempool simplify the implementation? It looks like a
> reasonable match.

Not sure what you mean by mempool, but I'm using a kmem_cache here for
all page table allocations. The pages that are passed to
panthor_vm_op_ctx::rsvd_page_tables::pages are allocated from this
pool. It's just that for each VM operation we pre-allocate page-tables,
and release those that were not used when the operation is done (we
over-allocate for the worst case scenario).

> 
> 2) Does it really make sense to have a separate pool of memory for every
> operation? Instead of having a separate pool for each operation, it
> would be possible to just keep track of the total number needed for all
> outstanding operations. Then a single (per device or maybe per-VM if
> necessary) mempool could be resized to ensure it has the right amount of
> space.

The pool is per-driver (see the global pt_cache). rsvd_page_tables just
holds pages needed for a specific VM operation. To be more specific, it
holds pages for the worst case (page table tree is empty, except for the
root page table).

> 
> I'm also a little wary that the VM_BIND infrastructure could potentially
> be abused to trigger a large amount of kernel allocation as it allocates
> up-front for the worst case but those pages are not charged to the
> process (AFAICT). But I haven't fully got my head round that yet.

Yep, that's problematic, indeed. I considered allocating page tables
as GEM objects, but the overhead of a GEM object is quite big
(hundreds of bytes of meta-data) compared to the size of a page table
(4k), and kmem_cache was just super convenient for this page table
cache :-).

> 
> > +
> > +   /** @flags: Combination of drm_panthor_vm_bind_op_flags. */
> > +   u32 flags;
> > +
> > +   /** @va: Virtual range targeted by the VM operation. */
> > +   struct {
> > +           /** @addr: Start address. */
> > +           u64 addr;
> > +
> > +           /** @range: Range size. */
> > +           u64 range;
> > +   } va;
> > +
> > +   /**
> > +    * @returned_vmas: List of panthor_vma objects returned after a VM 
> > operation.
> > +    *
> > +    * For unmap operations, this will contain all VMAs that were covered 
> > by the
> > +    * specified VA range.
> > +    *
> > +    * For map operations, this will contain all VMAs that previously 
> > mapped to
> > +    * the specified VA range.
> > +    *
> > +    * Those VMAs, and the resources they point to will be released as part 
> > of
> > +    * the op_ctx cleanup operation.
> > +    */
> > +   struct list_head returned_vmas;
> > +
> > +   /** @map: Fields specific to a map operation. */
> > +   struct {
> > +           /** @gem: GEM object information. */
> > +           struct {
> > +                   /** @obj: GEM object to map. */
> > +                   struct drm_gem_object *obj;
> > +
> > +                   /** @offset: Offset in the GEM object. */
> > +                   u64 offset;
> > +           } gem;
> > +
> > +           /**
> > +            * @sgt: sg-table pointing to pages backing the GEM object.
> > +            *
> > +            * This is gathered at job creation time, such that we don't 
> > have
> > +            * to allocate in ::run_job().
> > +            */
> > +           struct sg_table *sgt;
> > +
> > +           /**
> > +            * @prev_vma: Pre-allocated VMA object to deal with a remap 
> > situation.
> > +            *
> > +            * If the map request covers a region that's inside another 
> > VMA, the
> > +            * previous VMA will be split, requiring instantiation of a 
> > maximum of
> > +            * two new VMA objects.
> > +            */
> > +           struct panthor_vma *prev_vma;
> > +
> > +           /**
> > +            * @new_vma: The new VMA object that will be inserted to the VA 
> > tree.
> > +            */
> > +           struct panthor_vma *new_vma;
> > +
> > +           /**
> > +            * @next_vma: Pre-allocated VMA object to deal with a remap 
> > situation.
> > +            *
> > +            * See @prev_vma.
> > +            */
> > +           struct panthor_vma *next_vma;  
> 
> It's probably premature optimization, but it feels like having a cache
> of these VMA structures might be an idea.

If it's needed, I'll probably go for a kmem_cache, but I need to
check if it's worth it first (if the closest kmalloc cache is
significantly biffer than the struct size).

> I'm also struggling to
> understand how both a new prev and new next VMA are needed - but I
> haven't dug into the GPU VA manager.

prev/next are for mapping splits: an object is already mapped, and a new
object is mapped in the middle of this pre-existing mapping. In that
case, we need 2 vma object for the preceeding and succeeding mappings,
since the old mapping object will be released.

new_vma is for the new mapping.

> 
> > +   } map;
> > +};
> > +

[...]

> > +/**
> > + * panthor_vm_active() - Flag a VM as active
> > + * @VM: VM to flag as active.
> > + *
> > + * Assigns an address space to a VM so it can be used by the GPU/MCU.
> > + *
> > + * Return: 0 on success, a negative error code otherwise.
> > + */
> > +int panthor_vm_active(struct panthor_vm *vm)
> > +{
> > +   struct panthor_device *ptdev = vm->ptdev;
> > +   struct io_pgtable_cfg *cfg = 
> > &io_pgtable_ops_to_pgtable(vm->pgtbl_ops)->cfg;
> > +   int ret = 0, as, cookie;
> > +   u64 transtab, transcfg;
> > +
> > +   if (!drm_dev_enter(&ptdev->base, &cookie))
> > +           return -ENODEV;
> > +
> > +   mutex_lock(&ptdev->mmu->as.slots_lock);
> > +
> > +   as = vm->as.id;
> > +   if (as >= 0) {
> > +           u32 mask = panthor_mmu_as_fault_mask(ptdev, as);
> > +
> > +           if (ptdev->mmu->as.faulty_mask & mask) {
> > +                   /* Unhandled pagefault on this AS, the MMU was
> > +                    * disabled. We need to re-enable the MMU after
> > +                    * clearing+unmasking the AS interrupts.
> > +                    */
> > +                   gpu_write(ptdev, MMU_INT_CLEAR, mask);
> > +                   ptdev->mmu->as.faulty_mask &= ~mask;
> > +                   gpu_write(ptdev, MMU_INT_MASK, 
> > ~ptdev->mmu->as.faulty_mask);
> > +                   goto out_enable_as;
> > +           }
> > +
> > +           goto out_unlock;
> > +   }
> > +
> > +   /* Check for a free AS */
> > +   if (vm->for_mcu) {
> > +           drm_WARN_ON(&ptdev->base, ptdev->mmu->as.alloc_mask & BIT(0));
> > +           as = 0;
> > +   } else {
> > +           as = ffz(ptdev->mmu->as.alloc_mask | BIT(0));
> > +   }
> > +
> > +   if (!(BIT(as) & ptdev->gpu_info.as_present)) {
> > +           struct panthor_vm *lru_vm;
> > +
> > +           lru_vm = list_first_entry_or_null(&ptdev->mmu->as.lru_list,
> > +                                             struct panthor_vm,
> > +                                             as.lru_node);
> > +           if (drm_WARN_ON(&ptdev->base, !lru_vm)) {
> > +                   ret = -EBUSY;
> > +                   goto out_unlock;
> > +           }
> > +
> > +           list_del_init(&lru_vm->as.lru_node);
> > +           as = lru_vm->as.id;  
> 
> Should this not set lru_vm->as.id = -1, so that the code knows the VM no
> longer has an address space?

Good catch!

> 
> > +   } else {
> > +           set_bit(as, &ptdev->mmu->as.alloc_mask);
> > +   }
> > +
> > +   /* Assign the free or reclaimed AS to the FD */
> > +   vm->as.id = as;
> > +   ptdev->mmu->as.slots[as].vm = vm;
> > +
> > +out_enable_as:
> > +   transtab = cfg->arm_lpae_s1_cfg.ttbr;
> > +   transcfg = AS_TRANSCFG_PTW_MEMATTR_WB |
> > +              AS_TRANSCFG_PTW_RA |
> > +              AS_TRANSCFG_ADRMODE_AARCH64_4K;
> > +   if (ptdev->coherent)
> > +           transcfg |= AS_TRANSCFG_PTW_SH_OS;
> > +
> > +   ret = panthor_mmu_as_enable(vm->ptdev, vm->as.id, transtab, transcfg, 
> > vm->memattr);
> > +
> > +out_unlock:
> > +   mutex_unlock(&ptdev->mmu->as.slots_lock);
> > +   drm_dev_exit(cookie);
> > +   return ret;
> > +}
> > +

[...]

> > +
> > +static void panthor_mmu_irq_handler(struct panthor_device *ptdev, u32 
> > status)
> > +{
> > +   status = panthor_mmu_fault_mask(ptdev, status);
> > +   while (status) {
> > +           u32 as = ffs(status | (status >> 16)) - 1;
> > +           u32 mask = panthor_mmu_as_fault_mask(ptdev, as);
> > +           u32 new_int_mask;
> > +           u64 addr;
> > +           u32 fault_status;
> > +           u32 exception_type;
> > +           u32 access_type;
> > +           u32 source_id;
> > +
> > +           fault_status = gpu_read(ptdev, AS_FAULTSTATUS(as));
> > +           addr = gpu_read(ptdev, AS_FAULTADDRESS_LO(as));
> > +           addr |= (u64)gpu_read(ptdev, AS_FAULTADDRESS_HI(as)) << 32;
> > +
> > +           /* decode the fault status */
> > +           exception_type = fault_status & 0xFF;
> > +           access_type = (fault_status >> 8) & 0x3;
> > +           source_id = (fault_status >> 16);
> > +
> > +           /* Page fault only */  
> 
> This comment makes no sense - it looks like it's copied over from panfrost.

Uh, it made sense before I dropped map/alloc-on-fault :-).

> 
> If I understand correctly we don't (currently) support growing on page
> fault - and it's not really needed now the MCU can handle the tiler heaps.

Exaclty. Map/alloc on fault is a bit challenging because of the whole
'we have to guarantee that a job is done in finite time, and we must
make sure fence signaling is not blocked on allocation'. Given
drm_gem_get_pages() doesn't do non-blocking allocations, I thought it'd
be preferable to postpone map-on-fault until we actually decide we need
it. Note that i915 seems to have some sort of non-blocking page
allocator in shmem_sg_alloc_table()[1].

> 
> > +           mutex_lock(&ptdev->mmu->as.slots_lock);
> > +
> > +           new_int_mask =
> > +                   panthor_mmu_fault_mask(ptdev, 
> > ~ptdev->mmu->as.faulty_mask);
> > +
> > +           /* terminal fault, print info about the fault */
> > +           drm_err(&ptdev->base,
> > +                   "Unhandled Page fault in AS%d at VA 0x%016llX\n"
> > +                   "raw fault status: 0x%X\n"
> > +                   "decoded fault status: %s\n"
> > +                   "exception type 0x%X: %s\n"
> > +                   "access type 0x%X: %s\n"
> > +                   "source id 0x%X\n",
> > +                   as, addr,
> > +                   fault_status,
> > +                   (fault_status & (1 << 10) ? "DECODER FAULT" : "SLAVE 
> > FAULT"),
> > +                   exception_type, panthor_exception_name(ptdev, 
> > exception_type),
> > +                   access_type, access_type_name(ptdev, fault_status),
> > +                   source_id);
> > +
> > +           /* Ignore MMU interrupts on this AS until it's been
> > +            * re-enabled.
> > +            */
> > +           ptdev->mmu->irq.mask = new_int_mask;
> > +           gpu_write(ptdev, MMU_INT_MASK, new_int_mask);
> > +
> > +           /* Disable the MMU to kill jobs on this AS. */
> > +           panthor_mmu_as_disable(ptdev, as);
> > +           mutex_unlock(&ptdev->mmu->as.slots_lock);
> > +
> > +           status &= ~mask;
> > +   }
> > +}
> > +PANTHOR_IRQ_HANDLER(mmu, MMU, panthor_mmu_irq_handler);
> > +

[...]

> > +
> > +/**
> > + * panthor_mmu_unplug() - Unplug the MMU logic
> > + * @ptdev: Device.
> > + *
> > + * No access to the MMU regs should be done after this function is called.
> > + * We suspend the IRQ and disable all VMs to guarantee that.
> > + */
> > +void panthor_mmu_unplug(struct panthor_device *ptdev)
> > +{
> > +   if (ptdev->mmu->irq.irq > 0)  
> 
> In what situation is this not true? AFAICT the driver probe will fail if
> the IRQ can't be obtained.

Right, I'll drop this test.

[1]https://elixir.bootlin.com/linux/v6.5/source/drivers/gpu/drm/i915/gem/i915_gem_shmem.c#L63

Reply via email to