[AMD Official Use Only - AMD Internal Distribution Only]

I might have worded that poorly, I meant that it seems like TLB flush is out of 
sync with the SDMA update, which leads to a page fault reliably. I don't feel 
it has anything to do with the implicit sync in itself. When TLB fence is 
created it's added to the dma_resv of the vm's root buffer object with BOOKKEEP 
usage specified, in order to make sure no PD/PT is freed before the flush. But 
I don't think it's being added as a job dependency within the 
amdgpu_vm_sdma_update, we're adding all the fences found within the dma_resv 
object with KERNEL usage specified. I may be missing something so I'd love to 
hear what you think.

Best regards,
Dejan
________________________________
From: Koenig, Christian <christian.koe...@amd.com>
Sent: Thursday, September 5, 2024 2:40 PM
To: Andjelkovic, Dejan <dejan.andjelko...@amd.com>; 
amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>
Cc: Prica, Nikola <nikola.pr...@amd.com>; Kuehling, Felix 
<felix.kuehl...@amd.com>; Deng, Emily <emily.d...@amd.com>
Subject: Re: [PATCH] drm/amdgpu: Raise dma resv usage for created TLB fence

Well that explanation doesn't seem to make much sense either.

What do you mean with TLB flush is occurring prematurely?

Regards,
Christian.

Am 05.09.24 um 14:38 schrieb Andjelkovic, Dejan:

[AMD Official Use Only - AMD Internal Distribution Only]

Hi there. We're running into a page fault issue that's very easily reproducible 
on a SRIOV environment when using SDMA for page table updates. Going through 
mapping logs and trace files, it seems TLB flush is occurring prematurely. 
Changing the usage to KERNEL completely stops the page fault from occurring 
with no performance impact, which was confirmed with extensive testing. Looking 
through amdgpu_vm_sdma.c, namely within amdgpu_vm_sdma_update when waiting for 
PD/PT moves to be completed, fences are iterated with KERNEL usage specified 
which are then added as a job dependency.

Best regards,
Dejan
________________________________
From: Koenig, Christian 
<christian.koe...@amd.com><mailto:christian.koe...@amd.com>
Sent: Thursday, September 5, 2024 1:17 PM
To: Andjelkovic, Dejan 
<dejan.andjelko...@amd.com><mailto:dejan.andjelko...@amd.com>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> 
<amd-gfx@lists.freedesktop.org><mailto:amd-gfx@lists.freedesktop.org>
Cc: Prica, Nikola <nikola.pr...@amd.com><mailto:nikola.pr...@amd.com>; 
Kuehling, Felix <felix.kuehl...@amd.com><mailto:felix.kuehl...@amd.com>; Deng, 
Emily <emily.d...@amd.com><mailto:emily.d...@amd.com>
Subject: Re: [PATCH] drm/amdgpu: Raise dma resv usage for created TLB fence

Am 05.09.24 um 10:58 schrieb Dejan Andjelkovic:
> When using SDMA for PT updates, a TLB fence hooked to a buffer
> objects dma resv object with usage declared as BOOKKEEP leaves a
> chance for TLB flush to occur prematurely. This will lead to a page
> fault. Raising the usage from BOOKKEEP to KERNEL removes this
> possibility.

Well that's complete nonsense. The usage model is for implicit syncing
and not even remotely related to TLB flushing.

What exactly is the problem you run into?

Regards,
Christian.

>
> Signed-off-by: Dejan Andjelkovic 
> <dejan.andjelko...@amd.com><mailto:dejan.andjelko...@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index f93804902fd3..8efc2cf9bbb0 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -928,7 +928,7 @@ amdgpu_vm_tlb_flush(struct amdgpu_vm_update_params 
> *params,
>
>                /* Makes sure no PD/PT is freed before the flush */
>                dma_resv_add_fence(vm->root.bo->tbo.base.resv, *fence,
> -                                DMA_RESV_USAGE_BOOKKEEP);
> +                                DMA_RESV_USAGE_KERNEL);
>        }
>   }
>


Reply via email to