On Fri, Aug 20, 2021 at 09:20:42AM +0200, Christian König wrote: > No, that perfectly works for me. > > The problem we used to have with this approach was that we potentially have > multiple timeouts at the same time. > > But when we serialize the timeout handling by using a single workqueue as > suggested by Daniel now as well then that isn't an issue any more.
Sorry I got massively burried in everything, catching up. Iirc there's a special function for parking schedulers (which panfrost now uses to handle its cross-engine reset), would be good to use that. And yeah if your reset code is potentially spawning across engines I think you need a single workqueue to make sure stuff doesn't go boom. Tbh might be best to check out what panfrost has done and ask panfrost folks for an ack on your approach. -Daniel > > Regards, > Christian. > > Am 20.08.21 um 09:12 schrieb Liu, Monk: > > [AMD Official Use Only] > > > > @Daniel Vetter @Grodzovsky, Andrey @Koenig, Christian > > Do you have any concern on the kthread_park() approach ? > > > > Theoretically speaking sched_main shall run there exclusively with > > job_timeout since they both touches jobs, and stop scheduler during > > job_timeout won't impact performance since in that scenario > > There was already something wrong/stuck on that ring/scheduler > > > > Thanks > > > > ------------------------------------------ > > Monk Liu | Cloud-GPU Core team > > ------------------------------------------ > > > > -----Original Message----- > > From: Liu, Monk > > Sent: Thursday, August 19, 2021 6:26 PM > > To: Daniel Vetter <dan...@ffwll.ch>; Grodzovsky, Andrey > > <andrey.grodzov...@amd.com> > > Cc: Alex Deucher <alexdeuc...@gmail.com>; Chen, JingWen > > <jingwen.ch...@amd.com>; Maling list - DRI developers > > <dri-de...@lists.freedesktop.org>; amd-gfx list > > <amd-gfx@lists.freedesktop.org>; Koenig, Christian > > <christian.koe...@amd.com> > > Subject: RE: [PATCH v2] Revert "drm/scheduler: Avoid accessing freed bad > > job." > > > > [AMD Official Use Only] > > > > Hi Daniel > > > > > > Why can't we stop the scheduler thread first, so that there's > > > > guaranteed no race? I've recently had a lot of discussions with > > > > panfrost folks about their reset that spawns across engines, and > > > > without stopping the scheduler thread first before you touch anything > > > > it's just plain impossible. > > Yeah we had this though as well in our mind. > > > > Our second approach is to call ktrhead_stop() in job_timedout() routine so > > that the "bad" job is guaranteed to be used without scheduler's touching > > or freeing, Check this sample patch one as well please: > > > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c > > b/drivers/gpu/drm/scheduler/sched_main.c > > index a2a9536..50a49cb 100644 > > --- a/drivers/gpu/drm/scheduler/sched_main.c > > +++ b/drivers/gpu/drm/scheduler/sched_main.c > > @@ -319,17 +319,12 @@ static void drm_sched_job_timedout(struct work_struct > > *work) > > sched = container_of(work, struct drm_gpu_scheduler, > > work_tdr.work); > > /* Protects against concurrent deletion in > > drm_sched_get_cleanup_job */ > > + kthread_park(sched->thread); > > spin_lock(&sched->job_list_lock); > > job = list_first_entry_or_null(&sched->pending_list, > > struct drm_sched_job, list); > > if (job) { > > - /* > > - * Remove the bad job so it cannot be freed by concurrent > > - * drm_sched_cleanup_jobs. It will be reinserted back after > > sched->thread > > - * is parked at which point it's safe. > > - */ > > - list_del_init(&job->list); > > spin_unlock(&sched->job_list_lock); > > status = job->sched->ops->timedout_job(job); > > @@ -345,6 +340,7 @@ static void drm_sched_job_timedout(struct work_struct > > *work) > > } else { > > spin_unlock(&sched->job_list_lock); > > } > > + kthread_unpark(sched->thread); > > if (status != DRM_GPU_SCHED_STAT_ENODEV) { > > spin_lock(&sched->job_list_lock); @@ -393,20 +389,6 @@ > > void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job > > *bad) > > kthread_park(sched->thread); > > /* > > - * Reinsert back the bad job here - now it's safe as > > - * drm_sched_get_cleanup_job cannot race against us and release the > > - * bad job at this point - we parked (waited for) any in progress > > - * (earlier) cleanups and drm_sched_get_cleanup_job will not be > > called > > - * now until the scheduler thread is unparked. > > - */ > > - if (bad && bad->sched == sched) > > - /* > > - * Add at the head of the queue to reflect it was the > > earliest > > - * job extracted. > > - */ > > - list_add(&bad->list, &sched->pending_list); > > - > > - /* > > * Iterate the job list from later to earlier one and either > > deactive > > * their HW callbacks or remove them from pending list if they > > already > > * signaled. > > > > > > Thanks > > > > ------------------------------------------ > > Monk Liu | Cloud-GPU Core team > > ------------------------------------------ > > > > -----Original Message----- > > From: Daniel Vetter <dan...@ffwll.ch> > > Sent: Thursday, August 19, 2021 5:31 PM > > To: Grodzovsky, Andrey <andrey.grodzov...@amd.com> > > Cc: Daniel Vetter <dan...@ffwll.ch>; Alex Deucher <alexdeuc...@gmail.com>; > > Chen, JingWen <jingwen.ch...@amd.com>; Maling list - DRI developers > > <dri-de...@lists.freedesktop.org>; amd-gfx list > > <amd-gfx@lists.freedesktop.org>; Liu, Monk <monk....@amd.com>; Koenig, > > Christian <christian.koe...@amd.com> > > Subject: Re: [PATCH v2] Revert "drm/scheduler: Avoid accessing freed bad > > job." > > > > On Wed, Aug 18, 2021 at 10:51:00AM -0400, Andrey Grodzovsky wrote: > > > On 2021-08-18 10:42 a.m., Daniel Vetter wrote: > > > > On Wed, Aug 18, 2021 at 10:36:32AM -0400, Andrey Grodzovsky wrote: > > > > > On 2021-08-18 10:32 a.m., Daniel Vetter wrote: > > > > > > On Wed, Aug 18, 2021 at 10:26:25AM -0400, Andrey Grodzovsky wrote: > > > > > > > On 2021-08-18 10:02 a.m., Alex Deucher wrote: > > > > > > > > > > > > > > > + dri-devel > > > > > > > > > > > > > > > > Since scheduler is a shared component, please add dri-devel > > > > > > > > on all scheduler patches. > > > > > > > > > > > > > > > > On Wed, Aug 18, 2021 at 7:21 AM Jingwen Chen > > > > > > > > <jingwen.ch...@amd.com> wrote: > > > > > > > > > [Why] > > > > > > > > > for bailing job, this commit will delete it from pending > > > > > > > > > list thus the bailing job will never have a chance to be > > > > > > > > > resubmitted even in advance tdr mode. > > > > > > > > > > > > > > > > > > [How] > > > > > > > > > after embeded hw_fence into amdgpu_job is done, the race > > > > > > > > > condition that this commit tries to work around is > > > > > > > > > completely solved.So revert this commit. > > > > > > > > > This reverts commit 135517d3565b48f4def3b1b82008bc17eb5d1c90. > > > > > > > > > v2: > > > > > > > > > add dma_fence_get/put() around timedout_job to avoid > > > > > > > > > concurrent delete during processing timedout_job > > > > > > > > > > > > > > > > > > Signed-off-by: Jingwen Chen <jingwen.ch...@amd.com> > > > > > > > > > --- > > > > > > > > > drivers/gpu/drm/scheduler/sched_main.c | 23 > > > > > > > > > +++++------------------ > > > > > > > > > 1 file changed, 5 insertions(+), 18 deletions(-) > > > > > > > > > > > > > > > > > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c > > > > > > > > > b/drivers/gpu/drm/scheduler/sched_main.c > > > > > > > > > index a2a953693b45..f9b9b3aefc4a 100644 > > > > > > > > > --- a/drivers/gpu/drm/scheduler/sched_main.c > > > > > > > > > +++ b/drivers/gpu/drm/scheduler/sched_main.c > > > > > > > > > @@ -314,6 +314,7 @@ static void drm_sched_job_timedout(struct > > > > > > > > > work_struct *work) > > > > > > > > > { > > > > > > > > > struct drm_gpu_scheduler *sched; > > > > > > > > > struct drm_sched_job *job; > > > > > > > > > + struct dma_fence *fence; > > > > > > > > > enum drm_gpu_sched_stat status = > > > > > > > > > DRM_GPU_SCHED_STAT_NOMINAL; > > > > > > > > > > > > > > > > > > sched = container_of(work, struct > > > > > > > > > drm_gpu_scheduler, work_tdr.work); @@ -325,11 +326,10 @@ > > > > > > > > > static void drm_sched_job_timedout(struct work_struct > > > > > > > > > *work) > > > > > > > > > > > > > > > > > > if (job) { > > > > > > > > > /* > > > > > > > > > - * Remove the bad job so it cannot be freed > > > > > > > > > by concurrent > > > > > > > > > - * drm_sched_cleanup_jobs. It will be > > > > > > > > > reinserted back after sched->thread > > > > > > > > > - * is parked at which point it's safe. > > > > > > > > > + * Get job->s_fence->parent here to avoid > > > > > > > > > concurrent delete during > > > > > > > > > + * processing timedout_job > > > > > > > > > */ > > > > > > > > > - list_del_init(&job->list); > > > > > > > > > + fence = > > > > > > > > > + dma_fence_get(job->s_fence->parent); > > > > > > > While this is true for amdgpu, it has no meaning for other > > > > > > > drivers for whom we haven't done the refactoring of embedding > > > > > > > HW fence (parent) into the job structure. > > > > > > > In fact thinking > > > > > > > about it, unless you do the HW fence embedding for all the > > > > > > > drivers using the scheduler you cannot revert this patch or > > > > > > > you will just break them. > > > > > > btw, why did you do that embedding? I do still have my patches > > > > > > with dma_fence annotations floating around, but my idea at least > > > > > > was to fix that issue with a mempool, not with embeddeding. What > > > > > > was the motivation for embedding the wh fence? > > > > > > -Daniel > > > > > The motivation was 2 fold, avoid memory allocation during jobs > > > > > submissions (HW fence allocation) because as Christian explained > > > > > this leads to deadlock with mm code during evictions due to memory > > > > > pressure (Christian can clarify if I messed > > > > Yeah that's the exact same thing I've chased with my dma_fence > > > > annotations, but thus far zero to none interested in getting it > > > > sorted. I think it'd be good to have some cross-driver agreement on > > > > how this should be solved before someone just charges ahead ... > > > > > > > > > this explanation). Second is to exactly revert this patch because > > > > > while it solved the issue described in the patch it created > > > > > another with drivers who baildc out early during TDR handling for > > > > > various reason and the job would just leak because it was already > > > > > removed form pending list. > > > > Can't we reinsert it before we restart the scheduler thread? It > > > > might need a separate list for that due to the lockless queue > > > > tricks. Or am I thinking about the wrong kind of "we lost the job"? > > > > -Danile > > > > > > If you look at the original patch it would reinsert it even earlier - > > > right after stopping the SW scheduler thread, and even then it was to > > > late for some drivers as they would decide to return back from their > > > TDR handler even before that. It is solvable but in an ugly way as far > > > as I see, you need to require each driver in his code to put the job > > > back in the list if they do it before reaching the place where > > > scheduler framework does it. Kind of spaghetti code seems to me. > > Hm yeah I didn't realize this all happens before we stop the scheduler > > thread. > > > > Why can't we stop the scheduler thread first, so that there's guaranteed no > > race? I've recently had a lot of discussions with panfrost folks about > > their reset that spawns across engines, and without stopping the scheduler > > thread first before you touch anything it's just plain impossible. > > > > I'm also still not understanding what exactly you guys have done, can > > someone please dig out the the amdgpu patches that motivate all this maybe > > that's clearer? A full explanation would still be good since I've only > > started in scheduler stuff. > > > > Another thing I recently pondered for tdr races looking at i915 code is > > whether the tdr should first block the completion fence for that job. My > > motivation is to have a race-free error capture (if the completion races > > then we might start evicting memory and everything goes boom), but maybe > > that helps here too. Some kind of atomic "block this fence from completing > > thing. > > > > Or I'm I completely guessing in the wrong direction? > > -Daniel > > > > > Andrey > > > > > > > > > > > Andrey > > > > > > > > > > > > > > > > > Andrey > > > > > > > > > > > > > > > > > > > > > > > spin_unlock(&sched->job_list_lock); > > > > > > > > > > > > > > > > > > status = > > > > > > > > > job->sched->ops->timedout_job(job); > > > > > > > > > @@ -342,6 +342,7 @@ static void drm_sched_job_timedout(struct > > > > > > > > > work_struct *work) > > > > > > > > > job->sched->ops->free_job(job); > > > > > > > > > sched->free_guilty = false; > > > > > > > > > } > > > > > > > > > + dma_fence_put(fence); > > > > > > > > > } else { > > > > > > > > > spin_unlock(&sched->job_list_lock); > > > > > > > > > } > > > > > > > > > @@ -392,20 +393,6 @@ void drm_sched_stop(struct > > > > > > > > > drm_gpu_scheduler *sched, struct drm_sched_job *bad) > > > > > > > > > > > > > > > > > > kthread_park(sched->thread); > > > > > > > > > > > > > > > > > > - /* > > > > > > > > > - * Reinsert back the bad job here - now it's safe as > > > > > > > > > - * drm_sched_get_cleanup_job cannot race against us > > > > > > > > > and release the > > > > > > > > > - * bad job at this point - we parked (waited for) any > > > > > > > > > in progress > > > > > > > > > - * (earlier) cleanups and drm_sched_get_cleanup_job > > > > > > > > > will not be called > > > > > > > > > - * now until the scheduler thread is unparked. > > > > > > > > > - */ > > > > > > > > > - if (bad && bad->sched == sched) > > > > > > > > > - /* > > > > > > > > > - * Add at the head of the queue to reflect it > > > > > > > > > was the earliest > > > > > > > > > - * job extracted. > > > > > > > > > - */ > > > > > > > > > - list_add(&bad->list, &sched->pending_list); > > > > > > > > > - > > > > > > > > > /* > > > > > > > > > * Iterate the job list from later to earlier > > > > > > > > > one and either deactive > > > > > > > > > * their HW callbacks or remove them from > > > > > > > > > pending list if they already > > > > > > > > > -- > > > > > > > > > 2.25.1 > > > > > > > > > > > -- > > Daniel Vetter > > Software Engineer, Intel Corporation > > https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&data=04%7C01%7Cmonk.liu%40amd.com%7C27fcce7ca8dd4f39608508d962f40f33%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637649622657672189%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=JVZtg3AhbiA%2FDmVbNGo3MxVliO83nh8%2Fi50PCMsvwyY%3D&reserved=0 > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch