amdgpu: prevent gpu access during reset recovery

Christian König Fri, 24 May 2024 00:49:18 -0700

Am 23.05.24 um 17:35 schrieb Li, Yunxiang (Teddy):

[Public]

Here is taking a different lock than the reset_domain->sem. It is a seperate 
reset_domain->gpu_sem that is only locked when we will actuall do reset, it is not 
taken in the skip_hw_reset path.

Exactly that is what you should *not* do. Please don't add any new lock to the 
code. This is already complicated enough.

If you think that adding wrappers for reset lock makes sense then we can 
probably do that, bot don't add any lock for hw access.

The two lock protects very different things though. The first case is we need 
to block two resets running in parallel,

No, that's not correct. Two parallel resets are prevent by using aworker queue for the resets.

The reset lock is here exactly to provide external thread theopportunity to make their operation mutual exclusive with the reset.

  this does not only cover GPU reset itself but also any teardown that happens 
before GPU reset. The second case is we need to ensure exclusive access to the 
GPU between GPU reset and GPU init, concurrent access is fine before GPU is 
reset.

If that is true you could in theory lower the locked area of theexisting lock, but adding a new one is strict no-go from my side.


Regards,
Christian.


Theoretically, the second case happens within the first case, so locking the first 
case would protect against both. But with the current implementation this is 
infeasible, all the generic functions called between 
amdgpu_device_lock/unlock_reset_domain would need to be swapped out for special 
versions so the reset thread does not dead lock itself. It is much simpler to have a 
second, much narrower lock that only covers GPU reset<->GPU init because all 
the accesses there are very low level anyway.

Teddy

Re: [PATCH 4/4] drm/amdgpu: prevent gpu access during reset recovery

Reply via email to