Thanks for all the explanations. Looks like this part of scheduler has a lot of bugs.
On Tue, Oct 9, 2018 at 2:55 AM Christian König <ckoenig.leichtzumer...@gmail.com> wrote: > > Am 08.10.2018 um 19:40 schrieb Nayan Deshmukh: > > Hi Christian, > > > > I am a bit confused with this patch. It will be better if you can > > explain what these 3 functions are supposed to do. From my > > understanding this is what they are supposed to do: > > > > 1. drm_sched_job_timedout: This function is called when a job has been > > executing for more than "timeout" seconds. This function needs to > > identify the wrong job and call the timedout_job function which will > > notify the driver of that job and the gpu should be recovered to its > > original state. > > > > 2. drm_sched_hw_job_reset: stop the scheduler if it contains the bad > > job. ATM this function first removes all the callback and goes through > > the ring_mirror_list afterward to find the bad job. I think it should > > do it in the opposite order. Or my understanding of this function is > > not fully correct. > > That function actually has a couple of bugs itself, but it is irrelevant > for the current patch. > > > 3. drm_sched_job_recovery: recover jobs after a reset. It resubmits > > the job to the hardware queue avoiding the guilty job. > > That function is completely unrelated and only called after recovering > from a hard reset. > > > > > Some other questions and suggestions: > > > > 1. We can avoid the race condition altogether if we shift the > > canceling and rescheduling of the timedout work item to the > > drm_sched_process_job(). > > That won't work. drm_sched_process_job() is called from interrupt > context and can't sync with a timeout worker. > > > 2. How does removing and adding the callback help with the race > > condition? Moreover, the hardware might execute some jobs while we are > > in this function leading to more races. > > We need to make sure that the underlying hardware fence doesn't signal > and triggers new processing while we are about to call the drivers > timeout function to reset the hardware. > > This is done by removing the callbacks in reverse order (e.g. newest to > oldest). > > If we find that we can't remove the callback because the hardware has > actually continued and the fence has already signaled we add back the > callback again in normal order (e.g. oldest to newest) starting from the > job which was already signaled. Why do we need to call drm_sched_process_job() again for the last signaled job? Regards, Nayan > > > > > 3. What is the order in which the above 3 functions should be executed > > by the hardware? I think the answer to this question might clear a lot > > of my doubts. > > If we can quickly recover from a problem only the > drm_sched_job_timedout() should be execute. > > The other two are for hard resets where we need to stop multiple > scheduler instances and get them running again. > > That the karma handling is mixed into that is rather unfortunate and > actually quite buggy as well. > > I should probably also clean that up. > > Regards, > Christian. > > > > > Regards, > > Nayan > > > > On Mon, Oct 8, 2018 at 8:36 PM Christian König > > <ckoenig.leichtzumer...@gmail.com> wrote: > >> We need to make sure that we don't race between job completion and > >> timeout. > >> > >> Signed-off-by: Christian König <christian.koe...@amd.com> > >> --- > >> drivers/gpu/drm/scheduler/sched_main.c | 28 +++++++++++++++++++++++++++- > >> 1 file changed, 27 insertions(+), 1 deletion(-) > >> > >> diff --git a/drivers/gpu/drm/scheduler/sched_main.c > >> b/drivers/gpu/drm/scheduler/sched_main.c > >> index bd7d11c47202..ad3c57c9fd21 100644 > >> --- a/drivers/gpu/drm/scheduler/sched_main.c > >> +++ b/drivers/gpu/drm/scheduler/sched_main.c > >> @@ -248,14 +248,40 @@ static void drm_sched_job_begin(struct drm_sched_job > >> *s_job) > >> static void drm_sched_job_timedout(struct work_struct *work) > >> { > >> struct drm_gpu_scheduler *sched; > >> + struct drm_sched_fence *fence; > >> struct drm_sched_job *job; > >> + int r; > >> > >> sched = container_of(work, struct drm_gpu_scheduler, > >> work_tdr.work); > >> + > >> + spin_lock(&sched->job_list_lock); > >> + list_for_each_entry_reverse(job, &sched->ring_mirror_list, node) { > >> + fence = job->s_fence; > >> + if (!dma_fence_remove_callback(fence->parent, &fence->cb)) > >> + goto already_signaled; > >> + } > >> + > >> job = list_first_entry_or_null(&sched->ring_mirror_list, > >> struct drm_sched_job, node); > >> + spin_unlock(&sched->job_list_lock); > >> > >> if (job) > >> - job->sched->ops->timedout_job(job); > >> + sched->ops->timedout_job(job); > >> + > >> + spin_lock(&sched->job_list_lock); > >> + list_for_each_entry(job, &sched->ring_mirror_list, node) { > >> + fence = job->s_fence; > >> + if (!fence->parent || !list_empty(&fence->cb.node)) > >> + continue; > >> + > >> + r = dma_fence_add_callback(fence->parent, &fence->cb, > >> + drm_sched_process_job); > >> + if (r) > >> +already_signaled: > >> + drm_sched_process_job(fence->parent, &fence->cb); > >> + > >> + } > >> + spin_unlock(&sched->job_list_lock); > >> } > >> > >> /** > >> -- > >> 2.14.1 > >> > >> _______________________________________________ > >> dri-devel mailing list > >> dri-devel@lists.freedesktop.org > >> https://lists.freedesktop.org/mailman/listinfo/dri-devel > _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel