[Public]

> @@ -6098,7 +6097,8 @@  static int amdgpu_device_halt_activities(struct 
> amdgpu_device *adev,
>       /* We need to lock reset domain only once both for XGMI and single 
> device */
>       tmp_adev = list_first_entry(device_list_handle, struct amdgpu_device,
>                                   reset_list);
> -     amdgpu_device_lock_reset_domain(tmp_adev->reset_domain);
> +     if (!test_bit(AMDGPU_HOST_FLR, &reset_context->flags))
> +             amdgpu_device_lock_reset_domain(tmp_adev->reset_domain);
>
>       /* block all schedulers and reset given job's ring */
>       list_for_each_entry(tmp_adev, device_list_handle, reset_list) {

The host should be waiting for amdgpu_virt_ready_to_reset before it reset, 
which happens after amdgpu_device_halt_activities, so I think the lock here is 
fine. Is the host side wait timing out for you? If so the root cause should be 
that we take too long to halt guest activity.

Teddy

Reply via email to