On Thu, Oct 08, 2015 at 03:03:11PM +0100, Tvrtko Ursulin wrote:
> 
> Hi,
> 
> On 08/10/15 14:45, Ville Syrjälä wrote:
> > On Mon, Oct 05, 2015 at 01:26:36PM +0100, Tvrtko Ursulin wrote:
> >> From: Tvrtko Ursulin <tvrtko.ursu...@intel.com>
> >>
> >> Prevent leaking VMAs and PPGTT VMs when objects are imported
> >> via flink.
> >>
> >> Scenario is that any VMAs created by the importer will be left
> >> dangling after the importer exits, or destroys the PPGTT context
> >> with which they are associated.
> >>
> >> This is caused by object destruction not running when the
> >> importer closes the buffer object handle due the reference held
> >> by the exporter. This also leaks the VM since the VMA has a
> >> reference on it.
> >>
> >> In practice these leaks can be observed by stopping and starting
> >> the X server on a kernel with fbcon compiled in. Every time
> >> X server exits another VMA will be leaked against the fbcon's
> >> frame buffer object.
> >>
> >> Also on systems where flink buffer sharing is used extensively,
> >> like Android, this leak has even more serious consequences.
> >>
> >> This version is takes a general approach from the  earlier work
> >> by Rafael Barbalho (drm/i915: Clean-up PPGTT on context
> >> destruction) and tries to incorporate the subsequent discussion
> >> between Chris Wilson and Daniel Vetter.
> >>
> >> v2:
> >>
> >> Removed immediate cleanup on object retire - it was causing a
> >> recursive VMA unbind via i915_gem_object_wait_rendering. And
> >> it is in fact not even needed since by definition context
> >> cleanup worker runs only after the last context reference has
> >> been dropped, hence all VMAs against the VM belonging to the
> >> context are already on the inactive list.
> >>
> >> v3:
> >>
> >> Previous version could deadlock since VMA unbind waits on any
> >> rendering on an object to complete. Objects can be busy in a
> >> different VM which would mean that the cleanup loop would do
> >> the wait with the struct mutex held.
> >>
> >> This is an even simpler approach where we just unbind VMAs
> >> without waiting since we know all VMAs belonging to this VM
> >> are idle, and there is nothing in flight, at the point
> >> context destructor runs.
> >>
> >> v4:
> >>
> >> Double underscore prefix for __915_vma_unbind_no_wait and a
> >> commit message typo fix. (Michel Thierry)
> >>
> >> Signed-off-by: Tvrtko Ursulin <tvrtko.ursu...@intel.com>
> >> Testcase: igt/gem_ppgtt.c/flink-and-exit-vma-leak
> >> Reviewed-by: Michel Thierry <michel.thie...@intel.com>
> >> Cc: Daniel Vetter <daniel.vet...@ffwll.ch>
> >> Cc: Chris Wilson <ch...@chris-wilson.co.uk>
> >> Cc: Rafael Barbalho <rafael.barba...@intel.com>
> >> Cc: Michel Thierry <michel.thie...@intel.com>
> >
> >> ---
> >>   drivers/gpu/drm/i915/i915_drv.h         |  5 +++++
> >>   drivers/gpu/drm/i915/i915_gem.c         | 20 ++++++++++++++++----
> >>   drivers/gpu/drm/i915/i915_gem_context.c | 24 ++++++++++++++++++++++++
> >>   3 files changed, 45 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/i915/i915_drv.h 
> >> b/drivers/gpu/drm/i915/i915_drv.h
> >> index 824e7245044d..58afa1bb2957 100644
> >> --- a/drivers/gpu/drm/i915/i915_drv.h
> >> +++ b/drivers/gpu/drm/i915/i915_drv.h
> >> @@ -2840,6 +2840,11 @@ i915_gem_object_ggtt_pin(struct drm_i915_gem_object 
> >> *obj,
> >>   int i915_vma_bind(struct i915_vma *vma, enum i915_cache_level 
> >> cache_level,
> >>              u32 flags);
> >>   int __must_check i915_vma_unbind(struct i915_vma *vma);
> >> +/*
> >> + * BEWARE: Do not use the function below unless you can _absolutely_
> >> + * _guarantee_ VMA in question is _not in use_ anywhere.
> >> + */
> >> +int __must_check __i915_vma_unbind_no_wait(struct i915_vma *vma);
> >>   int i915_gem_object_put_pages(struct drm_i915_gem_object *obj);
> >>   void i915_gem_release_all_mmaps(struct drm_i915_private *dev_priv);
> >>   void i915_gem_release_mmap(struct drm_i915_gem_object *obj);
> >> diff --git a/drivers/gpu/drm/i915/i915_gem.c 
> >> b/drivers/gpu/drm/i915/i915_gem.c
> >> index f0cfbb9ee12c..52642aff1dab 100644
> >> --- a/drivers/gpu/drm/i915/i915_gem.c
> >> +++ b/drivers/gpu/drm/i915/i915_gem.c
> >> @@ -3208,7 +3208,7 @@ static void i915_gem_object_finish_gtt(struct 
> >> drm_i915_gem_object *obj)
> >>                                        old_write_domain);
> >>   }
> >>
> >> -int i915_vma_unbind(struct i915_vma *vma)
> >> +static int __i915_vma_unbind(struct i915_vma *vma, bool wait)
> >>   {
> >>    struct drm_i915_gem_object *obj = vma->obj;
> >>    struct drm_i915_private *dev_priv = obj->base.dev->dev_private;
> >> @@ -3227,9 +3227,11 @@ int i915_vma_unbind(struct i915_vma *vma)
> >>
> >>    BUG_ON(obj->pages == NULL);
> >>
> >> -  ret = i915_gem_object_wait_rendering(obj, false);
> >> -  if (ret)
> >> -          return ret;
> >> +  if (wait) {
> >> +          ret = i915_gem_object_wait_rendering(obj, false);
> >> +          if (ret)
> >> +                  return ret;
> >> +  }
> >>
> >>    if (i915_is_ggtt(vma->vm) &&
> >>        vma->ggtt_view.type == I915_GGTT_VIEW_NORMAL) {
> >> @@ -3274,6 +3276,16 @@ int i915_vma_unbind(struct i915_vma *vma)
> >>    return 0;
> >>   }
> >>
> >> +int i915_vma_unbind(struct i915_vma *vma)
> >> +{
> >> +  return __i915_vma_unbind(vma, true);
> >> +}
> >> +
> >> +int __i915_vma_unbind_no_wait(struct i915_vma *vma)
> >> +{
> >> +  return __i915_vma_unbind(vma, false);
> >> +}
> >> +
> >>   int i915_gpu_idle(struct drm_device *dev)
> >>   {
> >>    struct drm_i915_private *dev_priv = dev->dev_private;
> >> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
> >> b/drivers/gpu/drm/i915/i915_gem_context.c
> >> index 74aa0c9929ba..680b4c9f6b73 100644
> >> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> >> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> >> @@ -133,6 +133,23 @@ static int get_context_size(struct drm_device *dev)
> >>    return ret;
> >>   }
> >>
> >> +static void i915_gem_context_clean(struct intel_context *ctx)
> >> +{
> >> +  struct i915_hw_ppgtt *ppgtt = ctx->ppgtt;
> >> +  struct i915_vma *vma, *next;
> >> +
> >> +  if (WARN_ON_ONCE(!ppgtt))
> >> +          return;
> >
> > [   80.242892] [drm:i915_gem_open]
> > [   80.250485] ------------[ cut here ]------------
> > [   80.258938] WARNING: CPU: 1 PID: 277 at 
> > ../drivers/gpu/drm/i915/i915_gem_context.c:141 
> > i915_gem_context_free+0xef/0x27b [i915]()
> > [   80.275344] WARN_ON_ONCE(!ppgtt)
> 
> Ha, I did not see how ctx->ppgtt == NULL could exist but wanted to be 
> extra safe since i915_ppgtt_put checks for that.

Well, how can it exit?

-- 
Ville Syrjälä
Intel OTC
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to