Re: [PATCH v2 1/2] drm/amdkfd: Fix some double free when destroy queue fails

2021-06-17 Thread Felix Kuehling
Am 2021-06-17 um 8:41 a.m. schrieb Pan, Xinhui: > Felix > What I am wondreing is that if CP got hang, could we assume all usermode > queues have stopped? > If so, we can do cleanupwork regardless of the retval of > execute_queues_cpsch(). Right. That's what we currently do with ETIME, which hap

Re: [PATCH v2 1/2] drm/amdkfd: Fix some double free when destroy queue fails

2021-06-17 Thread Pan, Xinhui
Felix What I am wondreing is that if CP got hang, could we assume all usermode queues have stopped? If so, we can do cleanupwork regardless of the retval of execute_queues_cpsch(). > 2021年6月17日 20:11,Pan, Xinhui 写道: > > Felix > what I am thinking of like below looks like more simple. :) > > @

Re: [PATCH v2 1/2] drm/amdkfd: Fix some double free when destroy queue fails

2021-06-17 Thread Pan, Xinhui
Felix what I am thinking of like below looks like more simple. :) @@ -1501,6 +1501,11 @@ static int destroy_queue_cpsch(struct device_queue_manager *dqm, /* remove queue from list to prevent rescheduling after preemption */ dqm_lock(dqm); + if (dqm->is_hws_hang) { +

[PATCH v2 1/2] drm/amdkfd: Fix some double free when destroy queue fails

2021-06-17 Thread xinhui pan
Handle queue destroy failure while CP hang. Once CP got hang, kfd trigger GPU reset and set related flags to stop driver touching the queue. As we leave the queue as it is, we need keep the resource as it is too. Regardless user-space tries to destroy the queue again or not. We need put queue back