Am 2021-06-17 um 8:41 a.m. schrieb Pan, Xinhui:
> Felix
> What I am wondreing is that if CP got hang, could we assume all usermode
> queues have stopped?
> If so, we can do cleanupwork regardless of the retval of
> execute_queues_cpsch().
Right. That's what we currently do with ETIME, which hap
Felix
What I am wondreing is that if CP got hang, could we assume all usermode
queues have stopped?
If so, we can do cleanupwork regardless of the retval of execute_queues_cpsch().
> 2021年6月17日 20:11,Pan, Xinhui 写道:
>
> Felix
> what I am thinking of like below looks like more simple. :)
>
> @
Felix
what I am thinking of like below looks like more simple. :)
@@ -1501,6 +1501,11 @@ static int destroy_queue_cpsch(struct
device_queue_manager *dqm,
/* remove queue from list to prevent rescheduling after preemption */
dqm_lock(dqm);
+ if (dqm->is_hws_hang) {
+
Handle queue destroy failure while CP hang.
Once CP got hang, kfd trigger GPU reset and set related flags to stop
driver touching the queue. As we leave the queue as it is, we need keep
the resource as it is too.
Regardless user-space tries to destroy the queue again or not. We need
put queue back