[Public]

> From: Koenig, Christian <christian.koe...@amd.com>
> Sent: Wednesday, November 13, 2024 3:49
> Am 12.11.24 um 19:16 schrieb Li, Yunxiang (Teddy):
> >> From: Christian König <ckoenig.leichtzumer...@gmail.com>
> >> Sent: Tuesday, November 12, 2024 5:54
> >> Am 10.11.24 um 16:41 schrieb Yunxiang Li:
> >>> @@ -2612,7 +2707,6 @@ void amdgpu_vm_fini(struct amdgpu_device
> >>> *adev, struct amdgpu_vm *vm)
> >>>
> >>>      root = amdgpu_bo_ref(vm->root.bo);
> >>>      amdgpu_bo_reserve(root, true);
> >>> -   amdgpu_vm_put_task_info(vm->task_info);
> >>>      amdgpu_vm_set_pasid(adev, vm, 0);
> >>>      dma_fence_wait(vm->last_unlocked, false);
> >>>      dma_fence_put(vm->last_unlocked); @@ -2660,6 +2754,15 @@ void
> >>> amdgpu_vm_fini(struct amdgpu_device *adev,
> >> struct amdgpu_vm *vm)
> >>>              }
> >>>      }
> >>>
> >>> +   if (!amdgpu_vm_stats_is_zero(vm)) {
> >>> +           struct amdgpu_task_info *ti = vm->task_info;
> >>> +
> >>> +           dev_warn(adev->dev,
> >>> +                    "VM memory stats for proc %s(%d) task %s(%d) is
> >>> + non-zero
> >> when fini\n",
> >>> +                    ti->process_name, ti->pid, ti->task_name, ti->tgid);
> >>> +   }
> >>> +
> >>> +   amdgpu_vm_put_task_info(vm->task_info);
> >> Please don't move the call to amdgpu_vm_put_task_info().
> > Is keeping the task_info alive a hazard here? I could copy out the info, it 
> > just
> seemed a bit wasteful.
>
> Ah, now I see why you have moved that.
>
> IIRC we need to free up the task info before releasing the PASID, but that 
> info might
> be outdated. Need to check the code.
>
> Does it work if you move the message further up or does the root PD then break
> your neck because it isn't released yet?
>
> Thanks,
> Christian.

It needs to be after root BO is deleted. I think there's a way to go from pasid 
to task_info but not the other way around, so it should be safe? It's okay if 
the pasid/pid/etc gets recycled before we get here and we print outdated info 
since it's just so we know which application we should use to try to reproduce 
the bug.

Teddy

Reply via email to