On Fri, Oct 14, 2016 at 02:32:03PM +0100, Tvrtko Ursulin wrote:
> 
> On 14/10/2016 13:18, Chris Wilson wrote:
> >A while ago we switched from a contiguous array of pages into an sglist,
> >for that was both more convenient for mapping to hardware and avoided
> >the requirement for a vmalloc array of pages on every object. However,
> >certain GEM API calls (like pwrite, pread as well as performing
> >relocations) do desire access to individual struct pages. A quick hack
> >was to introduce a cache of the last access such that finding the
> >following page was quick - this works so long as the caller desired
> >sequential access. Walking backwards, or multiple callers, still hits a
> >slow linear search for each page. One solution is to store each
> >successful lookup in a radix tree.
> >
> >v2: Rewrite building the radixtree for clarity, hopefully.
> >
> >Signed-off-by: Chris Wilson <ch...@chris-wilson.co.uk>
> >---
> >  drivers/gpu/drm/i915/i915_drv.h         |  69 +++++-------
> >  drivers/gpu/drm/i915/i915_gem.c         | 179 
> > +++++++++++++++++++++++++++++---
> >  drivers/gpu/drm/i915/i915_gem_stolen.c  |   4 +-
> >  drivers/gpu/drm/i915/i915_gem_userptr.c |   4 +-
> >  4 files changed, 193 insertions(+), 63 deletions(-)
> >
> >diff --git a/drivers/gpu/drm/i915/i915_drv.h 
> >b/drivers/gpu/drm/i915/i915_drv.h
> >index 38467dde1efe..53cf4b0e5359 100644
> >--- a/drivers/gpu/drm/i915/i915_drv.h
> >+++ b/drivers/gpu/drm/i915/i915_drv.h
> >@@ -2273,9 +2273,12 @@ struct drm_i915_gem_object {
> >     struct sg_table *pages;
> >     int pages_pin_count;
> >-    struct get_page {
> >-            struct scatterlist *sg;
> >-            int last;
> >+    struct i915_gem_object_page_iter {
> >+            struct scatterlist *sg_pos;
> >+            unsigned long sg_idx;
> 
> We are not consistent in the code with type used for number of pages
> in an object. sg_alloc_table even takes an unsigned int for it, so
> for as long as we build them as we do, unsigned long is a waste.

I know. It's worrying, today there is a possibility that we overflow a
32bit size. If this was counting in pages, we would have a few more
years of grace. All I can say is that we are fortunate that memory
remains expensive in the exabyte range.

> >@@ -4338,6 +4349,8 @@ void i915_gem_object_init(struct drm_i915_gem_object 
> >*obj,
> >     obj->frontbuffer_ggtt_origin = ORIGIN_GTT;
> >     obj->madv = I915_MADV_WILLNEED;
> >+    INIT_RADIX_TREE(&obj->get_page.radix, GFP_ATOMIC | __GFP_NOWARN);
> 
> Pros & cons of GFP_ATOMIC here? Versus perhaps going with the mutex?
> I don't know how much data radix tree allocates with this, per node,
> but we can have a lot of pages. Would this create extra pressure on
> slab shrinking, and in turn out objects?

The problem is that we require sg lookup on a !pagefault path, hence
mutexes and GFP_KERNEL turn out to be illegal. :|

> >+            /* If we cannot allocate and insert this entry, or the
> >+             * individual pages from this range, cancel updating the
> >+             * sg_idx so that on this lookup we are forced to linearly
> >+             * scan onwards, but on future lookups we will try the
> >+             * insertion again (in which case we need to be careful of
> >+             * the error return reporting that we have already inserted
> >+             * this index).
> >+             */
> >+            ret = radix_tree_insert(&iter->radix, idx, sg);
> >+            if (ret && ret != -EEXIST)
> >+                    goto scan;
> 
> What other error can we get here? Internal allocation failure?

Yes. ENOMEM is the only other error.
> 
> >+
> >+            exception =
> >+                    RADIX_TREE_EXCEPTIONAL_ENTRY |
> >+                    idx << RADIX_TREE_EXCEPTIONAL_SHIFT;
> >+            for (i = 1; i < count; i++) {
> >+                    ret = radix_tree_insert(&iter->radix, idx + i,
> >+                                            (void *)exception);
> >+                    if (ret && ret != -EEXIST)
> >+                            goto scan;
> >+            }
> >+
> >+            idx += count;
> >+            sg = ____sg_next(sg);
> >+            count = __sg_page_count(sg);
> >+    }
> >+
> >+scan:
> >+    iter->sg_pos = sg;
> >+    iter->sg_idx = idx;
> >+
> >+    spin_unlock(&iter->lock);
> >+
> >+    if (unlikely(n < idx)) /* insertion completed by another thread */
> >+            goto lookup;
> >+
> >+    /* In case we failed to insert the entry into the radixtree, we need
> >+     * to look beyond the current sg.
> >+     */
> >+    while (idx + count <= n) {
> >+            idx += count;
> >+            sg = ____sg_next(sg);
> >+            count = __sg_page_count(sg);
> >+    }
> >+
> 
> Hmm... I assume failures happen then since you added this fallback.
> Due GFP_ATOMIC?

No, this was always in the code, because malloc failures happen. Quite
often if you run igt ;)
 
> >+    *offset = n - idx;
> >+    return sg;
> >+
> >+lookup:
> >+    rcu_read_lock();
> >+
> 
> Property of the radix tree implementation that the RCU lock is sufficient?

Yes. Lookups are RCU safe (the slots are freed via RCU). Writers must be
serialised by the caller.

> >diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c 
> >b/drivers/gpu/drm/i915/i915_gem_stolen.c
> >index f4f6d3a48b05..70e61bc35c60 100644
> >--- a/drivers/gpu/drm/i915/i915_gem_stolen.c
> >+++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
> >@@ -595,8 +595,8 @@ _i915_gem_object_create_stolen(struct drm_device *dev,
> >     if (obj->pages == NULL)
> >             goto cleanup;
> >-    obj->get_page.sg = obj->pages->sgl;
> >-    obj->get_page.last = 0;
> >+    obj->get_page.sg_pos = obj->pages->sgl;
> >+    obj->get_page.sg_idx = 0;
> >     i915_gem_object_pin_pages(obj);
> >     obj->stolen = stolen;
> >diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c 
> >b/drivers/gpu/drm/i915/i915_gem_userptr.c
> >index 1c891b92ac80..cb95789da76e 100644
> >--- a/drivers/gpu/drm/i915/i915_gem_userptr.c
> >+++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
> >@@ -526,8 +526,8 @@ __i915_gem_userptr_get_pages_worker(struct work_struct 
> >*_work)
> >                     if (ret == 0) {
> >                             list_add_tail(&obj->global_list,
> >                                           &to_i915(dev)->mm.unbound_list);
> >-                            obj->get_page.sg = obj->pages->sgl;
> >-                            obj->get_page.last = 0;
> >+                            obj->get_page.sg_pos = obj->pages->sgl;
> >+                            obj->get_page.sg_idx = 0;
> 
> Almost like these ones would be better in a helper like
> i915_gem_object_init_page_lookup or something.

Yup.  I think I have a patch for that, with your r-b on it :)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to