On 08.09.25 15:51, Alex Deucher wrote: > On Mon, Sep 8, 2025 at 8:54 AM Christian König <christian.koe...@amd.com> > wrote: >> >> On 05.09.25 20:39, Liu, Shaoyun wrote: >>> [AMD Official Use Only - AMD Internal Distribution Only] >>> >>> I can confirm that during world switch the entire gfx block (including gfx, >>> compute and sdma for gfx10+) been switched together . >> >> Yeah, but that simply doesn't work as expected. >> >> The problem is that the world switch can't preempt running gfx shaders and >> compute shaders only when CWSR is available. >> >> Now what world switch currently does is to wait for the gfx draw to finish, >> then pause the gfx queue and then other the compute queues. >> >> When gfx starts first that approach works, but when the compute queue runs >> first we then try to preempt a compute queue which is waiting for the gfx >> draw to start. >> >> Since we don't have CWSR for this compute queue this results in a lockup at >> the moment. > > Compute queues can still preempt without CWSR, it's just dispatch > level (like gfx) rather than instruction level preemption.
Yeah, exactly that's the problem. It can happen that the compute dispatch is already running and waiting for the gfx dispatch to start. But the gfx dispatch will never start because we are world switching and gfx has already been preempted. If you preempt gfx first it can happen that compute dispatch waits for gfx. If you preempt compute first it can happen that gfx dispatch waits for compute. As far as I can see that is unsolvable with the current approach. We would either need CWSR for mesh shader compute dispatches or a global barrier for each submission. Otherwise the hypervisor simply lacks the information and handling for executing a world switch when gang submit is used. Regards, Christian. > > Alex > >> >> Regards, >> Christian. >> >>> >>> Regards >>> Shaoyun.liu >>> >>> -----Original Message----- >>> From: amd-gfx <amd-gfx-boun...@lists.freedesktop.org> On Behalf Of Alex >>> Deucher >>> Sent: Friday, September 5, 2025 9:32 AM >>> To: Christian König <ckoenig.leichtzumer...@gmail.com> >>> Cc: Deucher, Alexander <alexander.deuc...@amd.com>; >>> amd-gfx@lists.freedesktop.org; timur.kris...@gmail.com >>> Subject: Re: [PATCH 2/2] drm/amdgpu: reject gang submissions under SRIOV >>> >>> On Fri, Sep 5, 2025 at 8:47 AM Christian König >>> <ckoenig.leichtzumer...@gmail.com> wrote: >>>> >>>> Gang submission means that the kernel driver guarantees that multiple >>>> submissions are executed on the HW at the same time on different engines. >>>> >>>> Background is that those submissions then depend on each other and >>>> each can't finish stand alone. >>>> >>>> SRIOV now uses world switch to preempt submissions on the engines to >>>> allow sharing the HW resources between multiple VFs. >>>> >>>> The problem is now that the SRIOV world switch can't know about such >>>> inter dependencies and will cause a timeout if it waits for a >>>> partially running gang submission. >>>> >>>> To conclude SRIOV and gang submissions are fundamentally incompatible >>>> at the moment. For now just disable them. >>> >>> Are you sure about this? Thinking about this more, most gang submissions >>> are between gfx and compute. The entire GC block (gfx, compute, and sdma >>> on gfx10+) gets preempted on world switch so all of the active queues would >>> be preempted. Everything gets resumed when the VF gets switched back. >>> VCN/JPEG gets switched independently so that could be a problem if you have >>> a gang with VCN and GC, but I think all gangs within GC should in theory be >>> ok. >>> >>> Alex >>> >>>> >>>> Signed-off-by: Christian König <christian.koe...@amd.com> >>>> --- >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c >>>> index 2ac9729e4c86..434a551365c7 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c >>>> @@ -286,7 +286,7 @@ static int amdgpu_cs_pass1(struct amdgpu_cs_parser *p, >>>> } >>>> } >>>> >>>> - if (!p->gang_size) { >>>> + if (!p->gang_size || (amdgpu_sriov_vf(p->adev) && p->gang_size >>>> + > 1)) { >>>> ret = -EINVAL; >>>> goto free_all_kdata; >>>> } >>>> -- >>>> 2.43.0 >>>> >>