On Thu, Mar 27, 2025 at 9:50 AM Christian König <christian.koe...@amd.com> wrote: > > Am 27.03.25 um 10:37 schrieb SRINIVASAN SHANMUGAM: > > On 3/27/2025 2:54 PM, Christian König wrote: > > Over all this change doesn't seem to make much sense to me. > > Why exactly is isolation->spearhead not pointing to the dummy kernel job we > submit? > > Does the owner check or gang_submit check in > amdgpu_device_enforce_isolation() fail to set up the spearhead? > > I'm currently debugging exactly that. > > Good news is that I can reproduce the problem. > > > I have to take that back. I've tested the cleaner shader functionality a bit > this morning and as far as I can see this works exactly as intended. > > Srini, what exactly is your use case which doesn't work? > > Hi Christian, Good Morning! > > The usecase is to trigger the cleaner shader, using sysfs > "run_cleaner_shader" independent of enabling "enforce_isolation", so that > cleaner shader packet gets submitted to COMP_1.0.0 ring by default, without > prior enabling any enforce_isolation via sysfs, > > > I've tested exactly that and it seems to work perfectly fine: > kworker/u96:1-209 [020] ..... 86.655999: amdgpu_isolation: > prev=0000000000000000, next=ffffffffffffffff > kworker/u96:1-209 [020] ..... 86.656190: amdgpu_cleaner_shader: > ring=gfx_0.0.0, seqno=2 > <...>-11 [022] ..... 150.607688: amdgpu_isolation: > prev=ffffffffffffffff, next=0000000000000000 > kworker/u96:0-11 [022] ..... 150.608228: amdgpu_cleaner_shader: > ring=comp_1.0.0, seqno=2 > kworker/u96:0-11 [022] ..... 150.620597: amdgpu_isolation: > prev=0000000000000000, next=ffffffffffffffff > kworker/u96:0-11 [022] ..... 150.620624: amdgpu_cleaner_shader: > ring=gfx_0.0.0, seqno=1527 > > > The only thing which might be confusing is that when you issue the cleaner > shader multiple times when the GPU is idle it would only run once. > > But that should be easy to change if necessary.
The problem is that it doesn't take into account KFD jobs. We need to be able to run the cleaner shader even if there have been no KGD jobs, Alex > > Regards, > Christian. > > AFAIK, this "isolation->spearhead" initialization is not being takencare in > this path "amdgpu_gfx_run_cleaner_shader -> > amdgpu_gfx_run_cleaner_shader_job" (ie., when we trigger cleaner shader, > using sysfs "run_cleaner_shader"), and this check > "&job->base.s_fence->scheduled == isolation->spearhead;" is having the > problem ie., "&job->base.s_fence->scheduled" address are is not matching with > "isolation->spearhead" address, which results into zero & thus fails to emit > cleaner shader, when running using "run_cleaner_shader" sysfs entry, in > "amdgpu_vm_flush()" function > > Best regards, > > Srini > > > Regards, > Christian. > > Regards, > Christian. > >