On Tue, Jul 01, 2025 at 09:40:05AM +0200, Christian König wrote: > On 13.06.25 23:20, Matthew Brost wrote: > > A small race exists between spsc_queue_push and the run-job worker, in > > which spsc_queue_push may return not-first while the run-job worker has > > already idled due to the job count being zero. If this race occurs, job > > scheduling stops, leading to hangs while waiting on the job’s DMA > > fences. > > > > Seal this race by incrementing the job count before appending to the > > SPSC queue. > > > > This race was observed on a drm-tip 6.16-rc1 build with the Xe driver in > > an SVM test case. > > > > Fixes: 1b1f42d8fde4 ("drm: move amd_gpu_scheduler into common location") > > Fixes: 27105db6c63a ("drm/amdgpu: Add SPSC queue to scheduler.") > > Cc: sta...@vger.kernel.org > > Signed-off-by: Matthew Brost <matthew.br...@intel.com> > > Sorry for the late response, if it isn't already pushed to drm-misc-fixes > then feel free to add Reviewed-by: Christian König <christian.koe...@amd.com> >
Thanks. Just pushed to drm-misc-fixes. Matt > > --- > > include/drm/spsc_queue.h | 4 +++- > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/include/drm/spsc_queue.h b/include/drm/spsc_queue.h > > index 125f096c88cb..ee9df8cc67b7 100644 > > --- a/include/drm/spsc_queue.h > > +++ b/include/drm/spsc_queue.h > > @@ -70,9 +70,11 @@ static inline bool spsc_queue_push(struct spsc_queue > > *queue, struct spsc_node *n > > > > preempt_disable(); > > > > + atomic_inc(&queue->job_count); > > + smp_mb__after_atomic(); > > + > > tail = (struct spsc_node **)atomic_long_xchg(&queue->tail, > > (long)&node->next); > > WRITE_ONCE(*tail, node); > > - atomic_inc(&queue->job_count); > > > > /* > > * In case of first element verify new node will be visible to the > > consumer >