[AMD Official Use Only]

In the previous discussion, you guys stated that we should drop the 
"kthread_should_park" in cleanup_job.

@@ -676,15 +676,6 @@ drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
{
        struct drm_sched_job *job, *next;

-       /*
-        * Don't destroy jobs while the timeout worker is running  OR thread
-        * is being parked and hence assumed to not touch pending_list
-        */
-       if ((sched->timeout != MAX_SCHEDULE_TIMEOUT &&
-           !cancel_delayed_work(&sched->work_tdr)) ||
-           kthread_should_park())
-               return NULL;

But I suddenly have a question here: if return the timedout job no matter 
kthread_should_park() or not, then we are backing to the original problem 
again: that the timedout_job is suddenly signaling and cleanup_job still 
returns it to sched_main and job is freed while it is still handling by 
vendor's timeout callback

If we return NULL when kthread_should_park() in cleanup_job, we can prevent 
above scenario from happening: once a job is processed by job_timedout we can 
stop its scheduler, and after that even this job suddenly signaled the 
cleanup_job won't return it so sched_main won't free it in parallel ...

What do you think ?
Thanks

------------------------------------------
Monk Liu | Cloud-GPU Core team
------------------------------------------

From: Liu, Monk
Sent: Wednesday, September 1, 2021 9:23 AM
To: Koenig, Christian <christian.koe...@amd.com>; Grodzovsky, Andrey 
<andrey.grodzov...@amd.com>; Daniel Vetter <dan...@ffwll.ch>; Chen, JingWen 
<jingwen.ch...@amd.com>
Cc: DRI Development <dri-de...@lists.freedesktop.org>; 
amd-gfx@lists.freedesktop.org
Subject: [diagnostic TDR mode patches] unify our solution opinions/suggestions 
in one thread


[AMD Official Use Only]

Hi Daniel/Christian/Andrey

It looks the voice from you three are spread over those email floods to me, the 
feature we are working on (diagnostic TDR scheme) is pending there for more 
than 6 month (we started it from feb 2021).

Honestly speaking the email ways that we are using now is not friendly and 
quite painful to me ....
Can we try to put all our opinions, suggestions, or even objects here together, 
let's go through them one by one, it's too hard for us to reply each email on 
different questions .

For [PATCH 1/2] drm/sched: fix the bug of time out calculation(v4)

This is a fixing patch on the timeout timer in scheduler, can we complete this 
one first ? it should already resolved all the questions and suggestions.

For [PATCH 2/2] drm/sched: serialize job_timeout and scheduler

I think I already explained the questions raised by Daniel in other thread , 
regarding why I use __kthread_should_park()
For other aspects, can we put all our opinion synthesized here ?

Thanks !

------------------------------------------
Monk Liu | Cloud-GPU Core team
------------------------------------------

Reply via email to