Am 01.04.25 um 12:50 schrieb SRINIVASAN SHANMUGAM: > > On 3/31/2025 6:31 PM, Christian König wrote: >> This reverts commit c2cc3648ba517a6c270500b5447d5a1efdad5936. >> >> Turned out that this has some negative consequences for some workloads. >> Instead check if the cleaner shader should run directly. >> >> Signed-off-by: Christian König <christian.koe...@amd.com> >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 4 ++-- >> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +- >> 2 files changed, 3 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c >> index 802743efa3b3..5eab1c1a380c 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c >> @@ -191,8 +191,8 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, >> unsigned int num_ibs, >> need_ctx_switch = ring->current_ctx != fence_ctx; >> if (ring->funcs->emit_pipeline_sync && job && >> ((tmp = amdgpu_sync_get_fence(&job->explicit_sync)) || >> - need_ctx_switch || amdgpu_vm_need_pipeline_sync(ring, job))) { >> - >> + (amdgpu_sriov_vf(adev) && need_ctx_switch) || > Should we need to, do this context switch, only on SRIOV cases > "amdgpu_sriov_vf(adev)" or even in normal BM use cases also? >> + amdgpu_vm_need_pipeline_sync(ring, job))) { >> need_pipe_sync = true; >> if (tmp) > If yes, could we split this patch into two 1. Actuall revert 2. below part is > new changes?
No, splitting it into two would be incorrect since the two decisions depend on each other. >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >> index b5ddfcbbc9fc..5f0f9e4beea9 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >> @@ -689,7 +689,7 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct >> amdgpu_job *job, >> patch = amdgpu_ring_init_cond_exec(ring, >> ring->cond_exe_gpu_addr); >> - if (need_pipe_sync) >> + if (need_pipe_sync || cleaner_shader_needed) > > Here now, this pipe line synchronization was usually meant for GPU jobs? and > not for client level switching? may I kno please, why it was OR'ed for even > "cleaner_shader_needed"? Is that do we have any usecases like where we don't > need pipeline sync in between jobs but we need to emit pipeline sync only > when "cleaner_shader_needed" (ie., wrt new enforce_isolation feature)? - but > even though in this "new_enforce_isolation feature" case - we would be > skipping the GPU jobs level pipe line synchronization within a client? and do > we forsee any synchronization/disruption issues in between jobs within a same > client wrt new enforce_ioslation feature? Pipeline sync means we wait for the pipeline to be idle (except for our self). This is necessary when one job depends on the results of another one, under SRIOV when switching the ring from one context to another or if the VMID is reused and we have a specific bug in the compute engine. It is also necessary while we switch GDS, OA or GWS resources or when we want to run the cleaner shader. I will update the code to make that a bit more easier to understand. Regards, Christian. > > Best regards, > > Srini > >> amdgpu_ring_emit_pipeline_sync(ring); >> if (cleaner_shader_needed)