On Mon, Jun 16, 2025 at 1:45 PM Christian König <christian.koe...@amd.com> wrote: > > On 6/16/25 15:47, Alex Deucher wrote: > > On Mon, Jun 16, 2025 at 8:16 AM Christian König > > <christian.koe...@amd.com> wrote: > >> > >> On 6/13/25 23:47, Alex Deucher wrote: > >>> Use the amdgpu fence container so we can store additional > >>> data in the fence. This also fixes the start_time handling > >>> for MCBP since we were casting the fence to an amdgpu_fence > >>> and it wasn't. > >>> > >>> Fixes: 3f4c175d62d8 ("drm/amdgpu: MCBP based on DRM scheduler (v9)") > >>> Signed-off-by: Alex Deucher <alexander.deuc...@amd.com> > >>> --- > >>> drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 2 +- > >>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +- > >>> drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 30 +++++---------------- > >>> drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 12 ++++----- > >>> drivers/gpu/drm/amd/amdgpu/amdgpu_job.h | 2 +- > >>> drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 16 +++++++++++ > >>> 6 files changed, 32 insertions(+), 32 deletions(-) > >>> > >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c > >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c > >>> index 8e626f50b362e..f81608330a3d0 100644 > >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c > >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c > >>> @@ -1902,7 +1902,7 @@ static void > >>> amdgpu_ib_preempt_mark_partial_job(struct amdgpu_ring *ring) > >>> continue; > >>> } > >>> job = to_amdgpu_job(s_job); > >>> - if (preempted && (&job->hw_fence) == fence) > >>> + if (preempted && (&job->hw_fence.base) == fence) > >>> /* mark the job as preempted */ > >>> job->preemption_status |= AMDGPU_IB_PREEMPTED; > >>> } > >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > >>> index 00174437b01ec..4893f834f4fd4 100644 > >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > >>> @@ -6397,7 +6397,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device > >>> *adev, > >>> * > >>> * job->base holds a reference to parent fence > >>> */ > >>> - if (job && dma_fence_is_signaled(&job->hw_fence)) { > >>> + if (job && dma_fence_is_signaled(&job->hw_fence.base)) { > >>> job_signaled = true; > >>> dev_info(adev->dev, "Guilty job already signaled, skipping > >>> HW reset"); > >>> goto skip_hw_reset; > >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c > >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c > >>> index 2f24a6aa13bf6..569e0e5373927 100644 > >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c > >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c > >>> @@ -41,22 +41,6 @@ > >>> #include "amdgpu_trace.h" > >>> #include "amdgpu_reset.h" > >>> > >>> -/* > >>> - * Fences mark an event in the GPUs pipeline and are used > >>> - * for GPU/CPU synchronization. When the fence is written, > >>> - * it is expected that all buffers associated with that fence > >>> - * are no longer in use by the associated ring on the GPU and > >>> - * that the relevant GPU caches have been flushed. > >>> - */ > >>> - > >>> -struct amdgpu_fence { > >>> - struct dma_fence base; > >>> - > >>> - /* RB, DMA, etc. */ > >>> - struct amdgpu_ring *ring; > >>> - ktime_t start_timestamp; > >>> -}; > >>> - > >>> static struct kmem_cache *amdgpu_fence_slab; > >>> > >>> int amdgpu_fence_slab_init(void) > >>> @@ -151,12 +135,12 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, > >>> struct dma_fence **f, struct amd > >>> am_fence = kmem_cache_alloc(amdgpu_fence_slab, GFP_ATOMIC); > >>> if (am_fence == NULL) > >>> return -ENOMEM; > >>> - fence = &am_fence->base; > >>> - am_fence->ring = ring; > >>> } else { > >>> /* take use of job-embedded fence */ > >>> - fence = &job->hw_fence; > >>> + am_fence = &job->hw_fence; > >>> } > >>> + fence = &am_fence->base; > >>> + am_fence->ring = ring; > >> > >> I would rather completely drop the job from the parameters and the general > >> fence allocation here. > >> > >> Instead we should just provide afence as input parameter and submit that > >> one. > >> > >> This should make sure that we don't run into such issues again. > > > > How about doing that as a follow on patch? It looks like that will be > > a much bigger patch for a stable bug fix. I think we can clean up a > > lot of stuff in amdgpu_fence.c with that change. > > Works for me. I would also suggest to remove the kmem_cache_alloc() and just > use kmalloc for the rare cases where we need an independent fence. > > Additional to that the ring and start_time member looks suspicious. We should > not have that inside the fence in the first place.
The ring member is used in a number of places to get from the fence to get to the fence_drv and the ring name. The start_time is from MCBP. I don't remember the details. While we are here, I think we can remove job->job_run_counter as well? We don't support resubmission anymore. Alex > > Regards, > Christian. > > > > > Alex > > > >> > >> Apart from that looks good to me, > >> Christian. > >> > >>> > >>> seq = ++ring->fence_drv.sync_seq; > >>> if (job && job->job_run_counter) { > >>> @@ -718,7 +702,7 @@ void amdgpu_fence_driver_clear_job_fences(struct > >>> amdgpu_ring *ring) > >>> * it right here or we won't be able to track them > >>> in fence_drv > >>> * and they will remain unsignaled during sa_bo > >>> free. > >>> */ > >>> - job = container_of(old, struct amdgpu_job, > >>> hw_fence); > >>> + job = container_of(old, struct amdgpu_job, > >>> hw_fence.base); > >>> if (!job->base.s_fence && > >>> !dma_fence_is_signaled(old)) > >>> dma_fence_signal(old); > >>> RCU_INIT_POINTER(*ptr, NULL); > >>> @@ -780,7 +764,7 @@ static const char > >>> *amdgpu_fence_get_timeline_name(struct dma_fence *f) > >>> > >>> static const char *amdgpu_job_fence_get_timeline_name(struct dma_fence > >>> *f) > >>> { > >>> - struct amdgpu_job *job = container_of(f, struct amdgpu_job, > >>> hw_fence); > >>> + struct amdgpu_job *job = container_of(f, struct amdgpu_job, > >>> hw_fence.base); > >>> > >>> return (const char *)to_amdgpu_ring(job->base.sched)->name; > >>> } > >>> @@ -810,7 +794,7 @@ static bool amdgpu_fence_enable_signaling(struct > >>> dma_fence *f) > >>> */ > >>> static bool amdgpu_job_fence_enable_signaling(struct dma_fence *f) > >>> { > >>> - struct amdgpu_job *job = container_of(f, struct amdgpu_job, > >>> hw_fence); > >>> + struct amdgpu_job *job = container_of(f, struct amdgpu_job, > >>> hw_fence.base); > >>> > >>> if > >>> (!timer_pending(&to_amdgpu_ring(job->base.sched)->fence_drv.fallback_timer)) > >>> > >>> amdgpu_fence_schedule_fallback(to_amdgpu_ring(job->base.sched)); > >>> @@ -845,7 +829,7 @@ static void amdgpu_job_fence_free(struct rcu_head > >>> *rcu) > >>> struct dma_fence *f = container_of(rcu, struct dma_fence, rcu); > >>> > >>> /* free job if fence has a parent job */ > >>> - kfree(container_of(f, struct amdgpu_job, hw_fence)); > >>> + kfree(container_of(f, struct amdgpu_job, hw_fence.base)); > >>> } > >>> > >>> /** > >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c > >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c > >>> index acb21fc8b3ce5..ddb9d3269357c 100644 > >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c > >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c > >>> @@ -272,8 +272,8 @@ void amdgpu_job_free_resources(struct amdgpu_job *job) > >>> /* Check if any fences where initialized */ > >>> if (job->base.s_fence && job->base.s_fence->finished.ops) > >>> f = &job->base.s_fence->finished; > >>> - else if (job->hw_fence.ops) > >>> - f = &job->hw_fence; > >>> + else if (job->hw_fence.base.ops) > >>> + f = &job->hw_fence.base; > >>> else > >>> f = NULL; > >>> > >>> @@ -290,10 +290,10 @@ static void amdgpu_job_free_cb(struct drm_sched_job > >>> *s_job) > >>> amdgpu_sync_free(&job->explicit_sync); > >>> > >>> /* only put the hw fence if has embedded fence */ > >>> - if (!job->hw_fence.ops) > >>> + if (!job->hw_fence.base.ops) > >>> kfree(job); > >>> else > >>> - dma_fence_put(&job->hw_fence); > >>> + dma_fence_put(&job->hw_fence.base); > >>> } > >>> > >>> void amdgpu_job_set_gang_leader(struct amdgpu_job *job, > >>> @@ -322,10 +322,10 @@ void amdgpu_job_free(struct amdgpu_job *job) > >>> if (job->gang_submit != &job->base.s_fence->scheduled) > >>> dma_fence_put(job->gang_submit); > >>> > >>> - if (!job->hw_fence.ops) > >>> + if (!job->hw_fence.base.ops) > >>> kfree(job); > >>> else > >>> - dma_fence_put(&job->hw_fence); > >>> + dma_fence_put(&job->hw_fence.base); > >>> } > >>> > >>> struct dma_fence *amdgpu_job_submit(struct amdgpu_job *job) > >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h > >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h > >>> index f2c049129661f..931fed8892cc1 100644 > >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h > >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h > >>> @@ -48,7 +48,7 @@ struct amdgpu_job { > >>> struct drm_sched_job base; > >>> struct amdgpu_vm *vm; > >>> struct amdgpu_sync explicit_sync; > >>> - struct dma_fence hw_fence; > >>> + struct amdgpu_fence hw_fence; > >>> struct dma_fence *gang_submit; > >>> uint32_t preamble_status; > >>> uint32_t preemption_status; > >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h > >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h > >>> index b95b471107692..e1f25218943a4 100644 > >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h > >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h > >>> @@ -127,6 +127,22 @@ struct amdgpu_fence_driver { > >>> struct dma_fence **fences; > >>> }; > >>> > >>> +/* > >>> + * Fences mark an event in the GPUs pipeline and are used > >>> + * for GPU/CPU synchronization. When the fence is written, > >>> + * it is expected that all buffers associated with that fence > >>> + * are no longer in use by the associated ring on the GPU and > >>> + * that the relevant GPU caches have been flushed. > >>> + */ > >>> + > >>> +struct amdgpu_fence { > >>> + struct dma_fence base; > >>> + > >>> + /* RB, DMA, etc. */ > >>> + struct amdgpu_ring *ring; > >>> + ktime_t start_timestamp; > >>> +}; > >>> + > >>> extern const struct drm_sched_backend_ops amdgpu_sched_ops; > >>> > >>> void amdgpu_fence_driver_clear_job_fences(struct amdgpu_ring *ring); > >> >