Swapping SG BOs makes no sense, because TTM doesn't own the pages of this type of BO.

Last I checked, userptr BOs (and other SG BOs) were protected from swapout by the fact that they would not be added to the swap-LRU. But it looks like Christian just removed the swap-LRU. I guess this broke that protection:

commit 2cb51d22d70b18eaf339abf9758bf0b7608da65c
Author: Christian König <christian.koe...@amd.com>
Date:   Tue Oct 6 16:30:09 2020 +0200

    drm/ttm: remove swap LRU v3

    Instead evict round robin from each devices SYSTEM and TT domain.

    v2: reorder num_pages access reported by Dan's script
    v3: fix rebase fallout, num_pages should be 32bit

    Signed-off-by: Christian König <christian.koe...@amd.com>
    Tested-by: Nirmoy Das <nirmoy....@amd.com>
    Reviewed-by: Huang Rui <ray.hu...@amd.com>
    Reviewed-by: Matthew Auld <matthew.a...@intel.com>
    Link: https://patchwork.freedesktop.org/patch/424009/

Regards,
  Felix


On 2021-05-18 10:28 p.m., xinhui pan wrote:
cpu 1                                           cpu 2
kfd alloc BO A(userptr)                         alloc BO B(GTT)
     ->init -> validate                           -> init -> validate -> 
populate
     init_user_pages                              -> swapout BO A //hit ttm 
pages limit
        -> get_user_pages (fill up ttm->pages)
         -> validate -> populate
           -> swapin BO A // Now hit the BUG

We know that get_user_pages may race with swapout on same BO.
Threre are some issues I have met.
1) memory corruption.
This is because we do a swap before memory is setup. ttm_tt_swapout()
just create a swap_storage with its content being 0x0. So when we setup
memory after the swapout. The following swapin makes the memory
corrupted.

2) panic
When swapout happes with get_user_pages, they touch ttm->pages without
anylock. It causes memory corruption too. But I hit page fault mostly.

Signed-off-by: xinhui pan <xinhui....@amd.com>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 16 +++++++++++++++-
  1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 928e8d57cd08..42460e4480f8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -835,6 +835,7 @@ static int init_user_pages(struct kgd_mem *mem, uint64_t 
user_addr)
        struct amdkfd_process_info *process_info = mem->process_info;
        struct amdgpu_bo *bo = mem->bo;
        struct ttm_operation_ctx ctx = { true, false };
+       struct page **pages;
        int ret = 0;
mutex_lock(&process_info->lock);
@@ -852,7 +853,13 @@ static int init_user_pages(struct kgd_mem *mem, uint64_t 
user_addr)
                goto out;
        }
- ret = amdgpu_ttm_tt_get_user_pages(bo, bo->tbo.ttm->pages);
+       pages = kvmalloc_array(bo->tbo.ttm->num_pages,
+                       sizeof(struct page *),
+                       GFP_KERNEL | __GFP_ZERO);
+       if (!pages)
+               goto unregister_out;
+
+       ret = amdgpu_ttm_tt_get_user_pages(bo, pages);
        if (ret) {
                pr_err("%s: Failed to get user pages: %d\n", __func__, ret);
                goto unregister_out;
@@ -863,6 +870,12 @@ static int init_user_pages(struct kgd_mem *mem, uint64_t 
user_addr)
                pr_err("%s: Failed to reserve BO\n", __func__);
                goto release_out;
        }
+
+       WARN_ON_ONCE(bo->tbo.ttm->page_flags & TTM_PAGE_FLAG_SWAPPED);
+
+       memcpy(bo->tbo.ttm->pages,
+                       pages,
+                       sizeof(struct page*) * bo->tbo.ttm->num_pages);
        amdgpu_bo_placement_from_domain(bo, mem->domain);
        ret = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
        if (ret)
@@ -872,6 +885,7 @@ static int init_user_pages(struct kgd_mem *mem, uint64_t 
user_addr)
  release_out:
        amdgpu_ttm_tt_get_user_pages_done(bo->tbo.ttm);
  unregister_out:
+       kvfree(pages);
        if (ret)
                amdgpu_mn_unregister(bo);
  out:

Reply via email to