On 2024-08-29 18:16, Philip Yang wrote:
>
> On 2024-08-29 17:15, Felix Kuehling wrote:
>> On 2024-08-23 15:49, Philip Yang wrote:
>>> If GPU reset kick in while KFD restore_process_worker running, this may
>>> causes different issues, for example below rcu stall warning, because
>>> restore work
On 2024-08-29 17:15, Felix Kuehling
wrote:
On
2024-08-23 15:49, Philip Yang wrote:
If GPU reset kick in while KFD
restore_process_worker running, this may
causes different issues, for example below rcu stall warning,
On 2024-08-23 15:49, Philip Yang wrote:
If GPU reset kick in while KFD restore_process_worker running, this may
causes different issues, for example below rcu stall warning, because
restore work may move BOs and evict queues under VRAM pressure.
Fix this race by taking adev reset_domain read sem
On 2024-08-28 18:01, Felix Kuehling
wrote:
On 2024-08-23 15:49, Philip Yang wrote:
If GPU reset kick in while KFD
restore_process_worker running, this may
causes different issues, for example below rcu stall warnin
On 2024-08-23 15:49, Philip Yang wrote:
If GPU reset kick in while KFD restore_process_worker running, this may
causes different issues, for example below rcu stall warning, because
restore work may move BOs and evict queues under VRAM pressure.
Fix this race by taking adev reset_domain read s
If GPU reset kick in while KFD restore_process_worker running, this may
causes different issues, for example below rcu stall warning, because
restore work may move BOs and evict queues under VRAM pressure.
Fix this race by taking adev reset_domain read semaphore to prevent GPU
reset in restore_pro