On Thu, 2025-08-07 at 16:15 +0200, Christian König wrote:
> On 05.08.25 12:22, Philipp Stanner wrote:
> > On Tue, 2025-08-05 at 11:05 +0200, Christian König wrote:
> > > On 24.07.25 17:07, Philipp Stanner wrote:
> > > > > +/**
> > > > > + * DOC: Scheduler Fence Object
> > > > > + *
> > > > > + * The scheduler fence object (&struct drm_sched_fence) encapsulates 
> > > > > the whole
> > > > > + * time from pushing the job into the scheduler until the hardware 
> > > > > has finished
> > > > > + * processing it. It is managed by the scheduler. The implementation 
> > > > > provides
> > > > > + * dma_fence interfaces for signaling both scheduling of a command 
> > > > > submission
> > > > > + * as well as finishing of processing.
> > > > > + *
> > > > > + * The lifetime of this object also follows normal dma_fence 
> > > > > refcounting rules.
> > > > > + */
> > > > 
> > > > The relict I'm most unsure about is this docu for the scheduler fence.
> > > > I know that some drivers are accessing the s_fence, but I strongly
> > > > suspect that this is a) unncessary and b) dangerous.
> > > 
> > > Which s_fence member do you mean? The one in the job? That should be 
> > > harmless as far as I can see.
> > 
> > I'm talking about struct drm_sched_fence.
> 
> Yeah that is necessary for the drivers to know about. We could potentially 
> abstract it better but we can't really hide it completely.
> 
> > > 
> > > > But the original draft from Christian hinted at that. So, @Christian,
> > > > this would be an opportunity to discuss this matter.
> > > > 
> > > > Otherwise I'd drop this docu section in v2. What users don't know, they
> > > > cannot misuse.
> > > 
> > > I would rather like to keep that to avoid misusing the job as the object 
> > > for tracking the submission lifetime.
> > 
> > Why would a driver ever want to access struct drm_sched_fence? The
> > driver knows when it signaled the hardware fence, and it knows when its
> > callbacks run_job() and free_job() were invoked.
> > 
> > I'm open to learn what amdgpu does there and why.
> 
> The simplest use case is performance optimization. You sometimes have 
> submissions which ideally run with others at the same time.
> 
> So we have AMDGPU_CHUNK_ID_SCHEDULED_DEPENDENCIES which basically tries to 
> cast a fence to a scheduler fence and then only waits for the dependency to 
> be pushed to the HW instead of waiting for it to finish (see amdgpu_cs.c).

But the driver recognizes that a certain fence got / gets pushed right
now through backend_ops.run_job(), doesn't it?

> 
> Another example are gang submissions (where I still have the TODO to actually 
> fix the code to not crash in an OOM situation).
> 
> Here we have a gang leader and gang members which are guaranteed to run 
> together on the HW at the same time.
> 
> This works by adding scheduled dependencies to the gang leader so that the 
> scheduler pushes it to the HW only after all gang members have been pushed.
> 
> The first gang member pushed now triggers a dependency handling which makes 
> sure that no other gang can be pushed until gang leader is pushed as well.

You mean amdgpu registers callbacks to drm_sched_fence?

> 
> > > > > +/**
> > > > > + * DOC: Error and Timeout handling
> > > > > + *
> > > > > + * Errors are signaled by using dma_fence_set_error() on the 
> > > > > hardware fence
> > > > > + * object before signaling it with dma_fence_signal(). Errors are 
> > > > > then bubbled
> > > > > + * up from the hardware fence to the scheduler fence.
> > > > > + *
> > > > > + * The entity allows querying errors on the last run submission 
> > > > > using the
> > > > > + * drm_sched_entity_error() function which can be used to cancel 
> > > > > queued
> > > > > + * submissions in &struct drm_sched_backend_ops.run_job as well as 
> > > > > preventing
> > > > > + * pushing further ones into the entity in the driver's submission 
> > > > > function.
> > > > > + *
> > > > > + * When the hardware fence doesn't signal within a configurable 
> > > > > amount of time
> > > > > + * &struct drm_sched_backend_ops.timedout_job gets invoked. The 
> > > > > driver should
> > > > > + * then follow the procedure described in that callback's 
> > > > > documentation.
> > > > > + *
> > > > > + * (TODO: The timeout handler should probably switch to using the 
> > > > > hardware
> > > > > + * fence as parameter instead of the job. Otherwise the handling 
> > > > > will always
> > > > > + * race between timing out and signaling the fence).
> > > > 
> > > > This TODO can probably removed, too. The recently merged
> > > > DRM_GPU_SCHED_STAT_NO_HANG has solved this issue.
> > > 
> > > No, it only scratched on the surface of problems here.
> > > 
> > > I'm seriously considering sending a RFC patch to cleanup the job lifetime 
> > > and implementing this change.
> > > 
> > > Not necessarily giving the HW fence as parameter to the timeout callback, 
> > > but more generally not letting the scheduler depend on driver behavior.
> > 
> > That's rather vague. Regarding this TODO, "racing between timing out
> > and signaling the fence" can now be corrected by the driver. Are there
> > more issues? If so, we want to add a new FIXME for them.
> 
> Yeah good point. We basically worked around all those issues now.
> 
> It's just that I still see that we are missing a general concept. E.g. we 
> applied workaround on top of workaround until it didn't crashed any more 
> instead of saying ok that is the design does that work? Is it valid? etc...

Yes, that seems to have been our destiny for a while now :) :(

What I'm afraid of right now is that with the callbacks vs.
drm_sched_fence we now potentially have several distinct mechanisms for
doing things. The hardware fence is clearly the relevant
synchronization object for telling when a job is completed; yet, we
also have s_fence->finished.

Using it (for what?) is even encouraged by the docu:

        /**
         * @finished: this fence is what will be signaled by the scheduler
         * when the job is completed.
         *
         * When setting up an out fence for the job, you should use
         * this, since it's available immediately upon
         * drm_sched_job_init(), and the fence returned by the driver
         * from run_job() won't be created until the dependencies have
         * resolved.
         */


Anyways.
I think this is a big topic very suitable for our work shop at XDC. I
also have some ideas about paths forward that I want to present.


P.

> 
> > That said, such an RFC would obviously be great. We can discuss the
> > paragraph above there, if you want.
> 
> I will try to hack something together. Not necessarily complete but it should 
> show the direction.
> 
> Christian.
> 
> > 
> > 
> > Regards
> > P.

Reply via email to