On Fri, Nov 01, 2019 at 02:44:51PM +0000, Yang, Philip wrote:
> @@ -854,12 +853,20 @@ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo, 
> struct page **pages)
>               r = -EPERM;
>               goto out_unlock;
>       }
> +     up_read(&mm->mmap_sem);
> +     timeout = jiffies + msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT);
> +
> +retry:
> +     range->notifier_seq = mmu_range_read_begin(&bo->notifier);
>  
> +     down_read(&mm->mmap_sem);
>       r = hmm_range_fault(range, 0);
>       up_read(&mm->mmap_sem);
> -
> -     if (unlikely(r < 0))
> +     if (unlikely(r <= 0)) {
> +             if ((r == 0 || r == -EBUSY) && !time_after(jiffies, timeout))
> +                     goto retry;
>               goto out_free_pfns;
> +     }

I was reflecting on why this suddently became necessary, and I think
what might be happening is that hmm_range_fault() is trigging
invalidations as it runs (ie it is faulting in pages or something) and
that in turn causes the mrn to need retry.

The hmm version of this had a bug where a full
invalidate_range_start/end pair would not trigger retry, so this this
didn't happen.

This is unfortunate as the retry is unnecessary, but at this time I
can't think of a good way to separate an ignorable synchronous
invalidation caused by hmm_range_fault from an async one that cannot
be ignored..

A basic fix would be to not update the mrq seq in the notifier if
the invalidate is triggered by hmm_range_fault, but that seems
difficult to determine..

Any thoughts Jerome?

Jason
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Reply via email to