Re: [RFC PATCH 00/10] Xe DRM scheduler and long running workload plans

Matthew Brost Tue, 04 Apr 2023 11:09:36 -0700

On Tue, Apr 04, 2023 at 12:02:03PM -0600, Zeng, Oak wrote:
> Hi Matt, Thomas,
> 
> Some very bold out of box thinking in this area:
> 
> 1. so you want to use drm scheduler and dma-fence for long running workload. 
> Why you want to do this in the first place? What is the benefit? Drm 
> scheduler is pretty much a software scheduler. Modern gpu has scheduler built 
> at fw/hw level, as you said below for intel this is Guc. Can xe driver just 
> directly submit job to Guc, bypassing drm scheduler? 
>


If we did that now we have 2 paths for dependency track, flow controling
the ring, resets / error handling / backend submission implementations.
We don't want this.
 
> 2. using dma-fence for long run workload: I am well aware that page fault 
> (and the consequent memory allocation/lock acquiring to fix the fault) can 
> cause deadlock for a dma-fence wait. But I am not convinced that dma-fence 
> can't be used purely because the nature of the workload that it runs very 
> long (indefinite). I did a math: the dma_fence_wait_timeout function's third 
> param is the timeout which is a signed long type. If HZ is 1000, this is 
> about 23 days. If 23 days is not long enough, can we just change the timeout 
> parameter to signed 64 bits so it is much longer than our life time... 
> 
> So I mainly argue we can't use dma-fence for long-run workload is not because 
> the workload runs very long, rather because of the fact that we use page 
> fault for long-run workload. If we enable page fault for short-run workload, 
> we can't use dma-fence either. Page fault is the key thing here.
> 
> Now since we use page fault which is *fundamentally* controversial with 
> dma-fence design, why now just introduce a independent concept such as 
> user-fence instead of extending existing dma-fence? 
> 
> I like unified design. If drm scheduler, dma-fence can be extended to work 
> for everything, it is beautiful. But seems we have some fundamental problem 
> here.
>

Thomas's patches turn a dma-fence into KMD sync point (e.g. we just use
the signal / CB infrastructure) and enforce we don't use use these
dma-fences from the scheduler in memory reclaim paths or export these to
user space or other drivers. Think of this mode as SW only fence.

Matt
 
> Thanks,
> Oak
> 
> > -----Original Message-----
> > From: dri-devel <dri-devel-boun...@lists.freedesktop.org> On Behalf Of
> > Matthew Brost
> > Sent: April 3, 2023 8:22 PM
> > To: dri-devel@lists.freedesktop.org; intel...@lists.freedesktop.org
> > Cc: robdcl...@chromium.org; thomas.hellst...@linux.intel.com; 
> > airl...@linux.ie;
> > l...@asahilina.net; boris.brezil...@collabora.com; Brost, Matthew
> > <matthew.br...@intel.com>; christian.koe...@amd.com;
> > faith.ekstr...@collabora.com
> > Subject: [RFC PATCH 00/10] Xe DRM scheduler and long running workload plans
> > 
> > Hello,
> > 
> > As a prerequisite to merging the new Intel Xe DRM driver [1] [2], we
> > have been asked to merge our common DRM scheduler patches first as well
> > as develop a common solution for long running workloads with the DRM
> > scheduler. This RFC series is our first attempt at doing this. We
> > welcome any and all feedback.
> > 
> > This can we thought of as 4 parts detailed below.
> > 
> > - DRM scheduler changes for 1 to 1 relationship between scheduler and
> > entity (patches 1-3)
> > 
> > In Xe all of the scheduling of jobs is done by a firmware scheduler (the
> > GuC) which is a new paradigm WRT to the DRM scheduler and presents
> > severals problems as the DRM was originally designed to schedule jobs on
> > hardware queues. The main problem being that DRM scheduler expects the
> > submission order of jobs to be the completion order of jobs even across
> > multiple entities. This assumption falls apart with a firmware scheduler
> > as a firmware scheduler has no concept of jobs and jobs can complete out
> > of order. A novel solution for was originally thought of by Faith during
> > the initial prototype of Xe, create a 1 to 1 relationship between scheduler
> > and entity. I believe the AGX driver [3] is using this approach and
> > Boris may use approach as well for the Mali driver [4].
> > 
> > To support a 1 to 1 relationship we move the main execution function
> > from a kthread to a work queue and add a new scheduling mode which
> > bypasses code in the DRM which isn't needed in a 1 to 1 relationship.
> > The new scheduling mode should unify all drivers usage with a 1 to 1
> > relationship and can be thought of as using scheduler as a dependency /
> > infligt job tracker rather than a true scheduler.
> > 
> > - Generic messaging interface for DRM scheduler
> > 
> > Idea is to be able to communicate to the submission backend with in band
> > (relative to main execution function) messages. Messages are backend
> > defined and flexable enough for any use case. In Xe we use these
> > messages to clean up entites, set properties for entites, and suspend /
> > resume execution of an entity [5]. I suspect other driver can leverage
> > this messaging concept too as it a convenient way to avoid races in the
> > backend.
> > 
> > - Support for using TDR for all error paths of a scheduler / entity
> > 
> > Fix a few races / bugs, add function to dynamically set the TDR timeout.
> > 
> > - Annotate dma-fences for long running workloads.
> > 
> > The idea here is to use dma-fences only as sync points within the
> > scheduler and never export them for long running workloads. By
> > annotating these fences as long running we ensure that these dma-fences
> > are never used in a way that breaks the dma-fence rules. A benefit of
> > thus approach is the scheduler can still safely flow control the
> > execution ring buffer via the job limit without breaking the dma-fence
> > rules.
> > 
> > Again this a first draft and looking forward to feedback.
> > 
> > Enjoy - Matt
> > 
> > [1] https://gitlab.freedesktop.org/drm/xe/kernel
> > [2] https://patchwork.freedesktop.org/series/112188/
> > [3] https://patchwork.freedesktop.org/series/114772/
> > [4] https://patchwork.freedesktop.org/patch/515854/?series=112188&rev=1
> > [5] https://gitlab.freedesktop.org/drm/xe/kernel/-/blob/drm-xe-
> > next/drivers/gpu/drm/xe/xe_guc_submit.c#L1031
> > 
> > Matthew Brost (8):
> >   drm/sched: Convert drm scheduler to use a work queue rather than
> >     kthread
> >   drm/sched: Move schedule policy to scheduler / entity
> >   drm/sched: Add DRM_SCHED_POLICY_SINGLE_ENTITY scheduling policy
> >   drm/sched: Add generic scheduler message interface
> >   drm/sched: Start run wq before TDR in drm_sched_start
> >   drm/sched: Submit job before starting TDR
> >   drm/sched: Add helper to set TDR timeout
> >   drm/syncobj: Warn on long running dma-fences
> > 
> > Thomas Hellström (2):
> >   dma-buf/dma-fence: Introduce long-running completion fences
> >   drm/sched: Support long-running sched entities
> > 
> >  drivers/dma-buf/dma-fence.c                 | 142 +++++++---
> >  drivers/dma-buf/dma-resv.c                  |   5 +
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |  14 +-
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  15 +-
> >  drivers/gpu/drm/drm_syncobj.c               |   5 +-
> >  drivers/gpu/drm/etnaviv/etnaviv_sched.c     |   5 +-
> >  drivers/gpu/drm/lima/lima_sched.c           |   5 +-
> >  drivers/gpu/drm/msm/adreno/adreno_device.c  |   6 +-
> >  drivers/gpu/drm/msm/msm_ringbuffer.c        |   5 +-
> >  drivers/gpu/drm/panfrost/panfrost_job.c     |   5 +-
> >  drivers/gpu/drm/scheduler/sched_entity.c    | 127 +++++++--
> >  drivers/gpu/drm/scheduler/sched_fence.c     |   6 +-
> >  drivers/gpu/drm/scheduler/sched_main.c      | 278 +++++++++++++++-----
> >  drivers/gpu/drm/v3d/v3d_sched.c             |  25 +-
> >  include/drm/gpu_scheduler.h                 | 130 +++++++--
> >  include/linux/dma-fence.h                   |  60 ++++-
> >  16 files changed, 649 insertions(+), 184 deletions(-)
> > 
> > --
> > 2.34.1
>

Re: [RFC PATCH 00/10] Xe DRM scheduler and long running workload plans

Reply via email to