amdgpu: Handle GPU page faults correctly on non-4K page systems

Alex Deucher Wed, 25 Mar 2026 11:36:35 -0700

On Wed, Mar 25, 2026 at 2:05 PM Donet Tom <[email protected]> wrote:
>
>
> On 3/24/26 6:40 PM, Alex Deucher wrote:
> > Applied.  Thanks!
>
> Hi @Alex
>
> Thank you for applying this patch.
>
>
> I am planning to send the next version for PATCH 1/6. For the
> other patches that have already received Reviewed-by tags,
> would you prefer to pick them from this series, or should I
> include them again in the next version?


I'll pick up the reviewed patches.  Feel free to include them in your
resend if that's easier for you.  I'll pick up whatever the delta is
once those are reviewed.

Alex

>
> -Donet
>
>
> >
> > Alex
> >
> > On Mon, Mar 23, 2026 at 9:04 AM Christian König
> > <[email protected]> wrote:
> >> On 3/23/26 05:28, Donet Tom wrote:
> >>> During a GPU page fault, the driver restores the SVM range and then maps 
> >>> it
> >>> into the GPU page tables. The current implementation passes a 
> >>> GPU-page-size
> >>> (4K-based) PFN to svm_range_restore_pages() to restore the range.
> >>>
> >>> SVM ranges are tracked using system-page-size PFNs. On systems where the
> >>> system page size is larger than 4K, using GPU-page-size PFNs to restore 
> >>> the
> >>> range causes two problems:
> >>>
> >>> Range lookup fails:
> >>> Because the restore function receives PFNs in GPU (4K) units, the SVM
> >>> range lookup does not find the existing range. This will result in a
> >>> duplicate SVM range being created.
> >>>
> >>> VMA lookup failure:
> >>> The restore function also tries to locate the VMA for the faulting 
> >>> address.
> >>> It converts the GPU-page-size PFN into an address using the system page
> >>> size, which results in an incorrect address on non-4K page-size systems.
> >>> As a result, the VMA lookup fails with the message: "address 0xxxx VMA is
> >>> removed".
> >>>
> >>> This patch passes the system-page-size PFN to svm_range_restore_pages() so
> >>> that the SVM range is restored correctly on non-4K page systems.
> >>>
> >>> Signed-off-by: Donet Tom <[email protected]>
> >> Acked-by: Christian König <[email protected]>
> >>
> >>> ---
> >>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 6 +++---
> >>>   1 file changed, 3 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
> >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> >>> index 6a2ea200d90c..7a3cb0057ac5 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> >>> @@ -2985,14 +2985,14 @@ bool amdgpu_vm_handle_fault(struct amdgpu_device 
> >>> *adev, u32 pasid,
> >>>        if (!root)
> >>>                return false;
> >>>
> >>> -     addr /= AMDGPU_GPU_PAGE_SIZE;
> >>> -
> >>>        if (is_compute_context && !svm_range_restore_pages(adev, pasid, 
> >>> vmid,
> >>> -         node_id, addr, ts, write_fault)) {
> >>> +         node_id, addr >> PAGE_SHIFT, ts, write_fault)) {
> >>>                amdgpu_bo_unref(&root);
> >>>                return true;
> >>>        }
> >>>
> >>> +     addr /= AMDGPU_GPU_PAGE_SIZE;
> >>> +
> >>>        r = amdgpu_bo_reserve(root, true);
> >>>        if (r)
> >>>                goto error_unref;

Re: [RESEND RFC PATCH v3 3/6] drm/amdgpu: Handle GPU page faults correctly on non-4K page systems

Reply via email to