On Tue, Dec 09, 2025 at 03:28:49PM +0100, Philipp Stanner wrote: > On Tue, 2025-12-09 at 15:19 +0100, Christian König wrote: > > On 12/9/25 14:51, Philipp Stanner wrote: > > ... > > > > > > > > > > How can free_job_work, through > > > > > > > > > > drm_sched_get_finished_job(), get and > > > > > > > > > > free the same job? > > > > > > > > > > > > > > > > > > > > > > > > > > > > It can't. > > > > > > > > > > > > > > But exactly that happens somehow. Don't ask me how, I have no > > > > > > > idea. > > > > > > > > > > *Philipp refuses to elaborate and asks Christian* > > > > > > > > > > How are you so sure about that's what's happening? Anyways, assuming > > > > > it > > > > > is true: > > > > > > > > [ 489.134585] > > > > ================================================================== > > > > [ 489.141949] BUG: KASAN: slab-use-after-free in > > > > amdgpu_device_gpu_recover+0x968/0x990 [amdgpu] > > > > [ 489.151339] Read of size 4 at addr ffff88a0d5f4214c by task > > > > kworker/u128:0/12 > > > > [ 489.158686] > > > > [ 489.160277] CPU: 11 UID: 0 PID: 12 Comm: kworker/u128:0 Tainted: G > > > > E 6.16.0-1289896.3.zuul.0ec208edc00d48a9bae1719675cb777f > > > > #1 PREEMPT(voluntary) > > > > [ 489.160285] Tainted: [E]=UNSIGNED_MODULE > > > > [ 489.160288] Hardware name: TYAN B8021G88V2HR-2T/S8021GM2NR-2T, BIOS > > > > V1.03.B10 04/01/2019 > > > > [ 489.160292] Workqueue: amdgpu-reset-dev drm_sched_job_timedout > > > > [gpu_sched] > > > > [ 489.160306] Call Trace: > > > > [ 489.160308] <TASK> > > > > [ 489.160311] dump_stack_lvl+0x64/0x80 > > > > [ 489.160321] print_report+0xce/0x630 > > > > [ 489.160328] ? _raw_spin_lock_irqsave+0x86/0xd0 > > > > [ 489.160333] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 > > > > [ 489.160337] ? amdgpu_device_gpu_recover+0x968/0x990 [amdgpu] > > > > [ 489.161044] kasan_report+0xb8/0xf0 > > > > [ 489.161049] ? amdgpu_device_gpu_recover+0x968/0x990 [amdgpu] > > > > [ 489.161756] amdgpu_device_gpu_recover+0x968/0x990 [amdgpu] > > > > [ 489.162464] ? __pfx_amdgpu_device_gpu_recover+0x10/0x10 [amdgpu] > > > > [ 489.163170] ? amdgpu_coredump+0x1fd/0x4c0 [amdgpu] > > > > [ 489.163904] amdgpu_job_timedout+0x642/0x1400 [amdgpu] > > > > [ 489.164698] ? __pfx__raw_spin_lock+0x10/0x10 > > > > [ 489.164703] ? __pfx_amdgpu_job_timedout+0x10/0x10 [amdgpu] > > > > [ 489.165496] ? _raw_spin_lock+0x75/0xc0 > > > > [ 489.165499] ? __pfx__raw_spin_lock+0x10/0x10 > > > > [ 489.165503] drm_sched_job_timedout+0x1b0/0x4b0 [gpu_sched] > > > > > > That doesn't show that it's free_job() who freed the memory. > > > > [ 489.405936] Freed by task 2501: > > [ 489.409175] kasan_save_stack+0x20/0x40 > > [ 489.413122] kasan_save_track+0x14/0x30 > > [ 489.417064] kasan_save_free_info+0x3b/0x60 > > [ 489.421355] __kasan_slab_free+0x37/0x50 > > [ 489.425384] kfree+0x1fe/0x3f0 > > [ 489.428547] drm_sched_free_job_work+0x50e/0x930 [gpu_sched] > > [ 489.434326] process_one_work+0x679/0xff0 > > The time stamp shows that this free here took place after the UAF > occurred :D > > > > > > > @Vitaly: Can you reproduce the bug? If yes, adding debug prints > > > printing the jobs' addresses when allocated and when freed in > > > free_job() could be a solution. > > > > We can reproduce this pretty reliable in our CI now. > > > > > I repeat, we need more info :) > > > > > > > > > > > > > > > > > > > > > > > > > > My educated guess is that the job somehow ends up on the pending > > > > > > > list again. > > > > > > > > > > then the obvious question would be: does amdgpu touch the pending_list > > > > > itself, or does it only ever modify it through proper scheduler APIs? > > > > > > > > My educated guess is that drm_sched_stop() inserted the job back into > > > > the pending list, but I still have no idea how it is possible that > > > > free_job is running after the scheduler is stopped.
I believe I found your problem, referencing amdgpu/amdgpu_device.c here. 6718 if (job) 6719 ti = amdgpu_vm_get_task_info_pasid(adev, job->pasid); Which is after: 6695 skip_hw_reset: 6696 r = amdgpu_device_sched_resume(&device_list, reset_context, job_signaled); 6697 if (r) 6698 goto reset_unlock; The job is likely added back into this free list here: 6676 amdgpu_device_halt_activities(adev, job, reset_context, &device_list, 6677 hive, need_emergency_restart); So free_job runs and 'job->pasid' explodes. Save off the pasid on the stack at top of this function and I suspect your UAF goes away. This won't untangle this hair ball of code but I believe this at least prevent explosions. Matt > > > > > > > > > > And my uneducated guess is that it's happening in amdgpu. It seems a > > > sched_job lives inside an amdgpu_job. Can the latter be freed at other > > > places than free_job()? > > > > Nope, except for error handling during creation and initialization. > > > > > timedout_job() and free_job() cannot race against each other regarding > > > jobs. It's locked. > > > > > > But maybe investigate Matthew's suggestion and look into the guilty > > > mechanism, too. > > > > That looks just like a leftover from earlier attempts to fix the same > > problem. > > > > I mean look at the git history of how often that problem came up... > > If that's the case, then we don't want to yet add another solution to a > problem we don't fully understand and which, apparently, only occurs in > amdgpu today. > > What we need is an analysis of what's happening. Only then can we > decide what to do. > > Just switching the workqueues without such good justification receives > a NACK from me; also because of the unforseeable consequences – > free_job() is invoked extremely frequently, timedout_job() very rarely. > Drivers will not expect that their timeout_wq will be flooded with so > many work items. That could very certainly change behavior, cause > performance regressions and so on. > > > P. > > >
