Re: [Intel-gfx] [PATCH] drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl

Chris Wilson Mon, 27 Jan 2014 10:13:09 -0800

On Mon, Jan 27, 2014 at 09:56:12AM -0800, Volkin, Bradley D wrote:
> > +static void
> > +i915_mmu_notifier_del(struct i915_mmu_notifier *mmu,
> > +                 struct i915_mmu_object *mn)
> > +{
> > +   bool destroy;
> > +
> > +   spin_lock(&mmu->lock);
> > +   interval_tree_remove(&mn->it, &mmu->objects);
> > +   destroy = --mmu->count == 0;
> > +   __i915_mmu_notifier_update_serial(mmu);
> > +   spin_unlock(&mmu->lock);
> > +
> > +   if (destroy) /* protected against _add() by struct_mutex */
> > +           __i915_mmu_notifier_destroy(mmu);
> 
> I see that we should hold struct_mutex when this function is called,
> but I don't see that we try to get the mutex anywhere on the _add() path.
> Have I missed something?


No, it's fixed in a later patch. I assumed I had the lock taken in the
outermost ioctl routine.

> > +free_mn:
> > +   kfree(mn);
> > +destroy_mmu:
> > +   if (mmu->count == 0)
> > +           __i915_mmu_notifier_destroy(mmu);
> 
> Other accesses to mmu->count are protected by mmu->lock. Again, I may have 
> missed
> something but don't immediately see why that's not required.

mmu->count is protected by struct_mutex. See above.

> > +   if (pinned < num_pages) {
> > +           if (pinned < 0) {
> > +                   ret = pinned;
> > +                   pinned = 0;
> > +           } else {
> > +                   /* Spawn a worker so that we can acquire the
> > +                    * user pages without holding our mutex.
> > +                    */
> > +                   ret = -EAGAIN;
> > +                   if (obj->userptr.work == NULL) {
> > +                           struct get_pages_work *work;
> > +
> > +                           work = kmalloc(sizeof(*work), GFP_KERNEL);
> > +                           if (work != NULL) {
> > +                                   obj->userptr.work = &work->work;
> > +
> > +                                   work->obj = obj;
> > +                                   drm_gem_object_reference(&obj->base);
> > +
> > +                                   work->task = current;
> > +                                   get_task_struct(work->task);
> > +
> > +                                   INIT_WORK(&work->work, 
> > __i915_gem_userptr_get_pages_worker);
> > +                                   schedule_work(&work->work);
> 
> Any reason to use the system wq instead of the driver wq here?
> It doesn't look like it's the usual "takes modeset locks" justification.

Performance. Using the driver workqueue would serialise the clients, but
using the system workqueue we can do the pagefaulting in parallel.
> 
> > +                           } else
> > +                                   ret = -ENOMEM;
> > +                   } else {
> > +                           if (IS_ERR(obj->userptr.work)) {
> 
> } else if (...) { ?

No, I think it is clearer as is.
 
> > +                                   ret = PTR_ERR(obj->userptr.work);
> > +                                   obj->userptr.work = NULL;
> > +                           }
> > +                   }
> > +           }
> > +   } else {
> > +           struct sg_table *st;
> > +
> > +           st = kmalloc(sizeof(*st), GFP_KERNEL);
> > +           if (st == NULL || sg_alloc_table(st, num_pages, GFP_KERNEL)) {
> > +                   kfree(st);
> > +                   ret = -ENOMEM;
> > +           } else {
> > +                   struct scatterlist *sg;
> > +                   int n;
> > +
> > +                   for_each_sg(st->sgl, sg, num_pages, n)
> > +                           sg_set_page(sg, pvec[n], PAGE_SIZE, 0);
> > +
> > +                   obj->pages = st;
> > +                   obj->userptr.work = NULL;
> > +
> > +                   pinned = 0;
> > +                   ret = 0;
> > +           }
> 
> This block is almost identical to code in the worker. Would it be worth 
> extracting
> the common parts into a helper?

Almost, but subtly and importantly different. Only the loop was the same
at which point I didn't feel like the saving was significant. It is now
even less similar.
 
> > +   }
> > +
> > +   release_pages(pvec, pinned, 0);
> > +   drm_free_large(pvec);
> > +   return ret;
> > +}
> > +
> > +static void
> > +i915_gem_userptr_put_pages(struct drm_i915_gem_object *obj)
> > +{
> > +   struct scatterlist *sg;
> > +   int i;
> > +
> > +   if (obj->madv != I915_MADV_WILLNEED)
> > +           obj->dirty = 0;
> 
> This is subtly different than similar code in the standard put_pages() in that
> it sets dirty=0 for both DONTNEED and PURGED instead of just DONTNEED (w/ 
> BUG_ON(PURGED)).
> I don't think we will ever actually truncate userptr objects, so is there any 
> reason for
> this to be different?

No, there's no reason for the difference. Semantically it is the same, of
course.

> > +/**
> > + * Creates a new mm object that wraps some normal memory from the process
> > + * context - user memory.
> > + *
> > + * We impose several restrictions upon the memory being mapped
> > + * into the GPU.
> > + * 1. It must be page aligned (both start/end addresses, i.e ptr and size).
> > + * 2. We only allow a bo as large as we could in theory map into the GTT,
> > + *    that is we limit the size to the total size of the GTT.
> > + * 3. The bo is marked as being snoopable. The backing pages are left
> > + *    accessible directly by the CPU, but reads by the GPU may incur the 
> > cost
> > + *    of a snoop (unless you have an LLC architecture).
> 
> No overlapping ranges

Yes, that is an important addition.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl

Reply via email to