On 6/3/25 14:48, Tvrtko Ursulin wrote: > > On 03/06/2025 13:40, Christian König wrote: >> On 6/3/25 13:30, Tvrtko Ursulin wrote: >>> >>> On 02/06/2025 19:00, Christian König wrote: >>>> On 6/2/25 17:25, Tvrtko Ursulin wrote: >>>>> >>>>> On 02/06/2025 15:42, Christian König wrote: >>>>>> On 6/2/25 15:05, Tvrtko Ursulin wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> On 15/05/2025 14:15, Christian König wrote: >>>>>>>> Hey drm-misc maintainers, >>>>>>>> >>>>>>>> can you guys please backmerge drm-next into drm-misc-next? >>>>>>>> >>>>>>>> I want to push this patch here but it depends on changes which are >>>>>>>> partially in drm-next and partially in drm-misc-next. >>>>>>> >>>>>>> Looks like the backmerge is still pending? >>>>>> >>>>>> Yes, @Maarten, @Maxime and @Thomas ping on this. >>>>>> >>>>>>> In the meantime, Christian, any chance you will have some bandwith to >>>>>>> think about the tail end of the series? Specifically patch 6 and how >>>>>>> that is used onward. >>>>>> >>>>>> Well the RCU grace period is quite a nifty hack. I wanted to go over it >>>>>> again after merging the first patches from this series. >>>>>> >>>>>> In general looks like a good idea to me, I just don't like that we >>>>>> explicitely need to expose dma_fence_access_begin() and >>>>>> dma_fence_access_end(). >>>>>> >>>>>> Especially we can't do that while calling fence->ops->release. >>>>> >>>>> Hm why not? You think something will take offence of the rcu_read_lock()? >>>> >>>> Yes, especially it is perfectly legitimate to call synchronize_rcu() or >>>> lock semaphores/mutexes from that callback. >>>> >>>> Either keep the RCU critical section only for the trace or even better >>>> come up with some different approach, e.g. copying the string under the >>>> RCU lock or something like that. >>> >>> Hmm but the kerneldoc explicity says callback can be called from irq >>> context: >>> >>> /** >>> * @release: >>> * >>> * Called on destruction of fence to release additional resources. >>> * Can be called from irq context. This callback is optional. If it is >>> * NULL, then dma_fence_free() is instead called as the default >>> * implementation. >>> */ >>> void (*release)(struct dma_fence *fence); >> >> Ah, right. I mixed that up with the dma-buf object. >> >> Yeah in that case that is probably harmless. We delegate the final free to a >> work item if necessary anyway. >> >> But I would still like to avoid having the RCU cover the release as well. Or >> why is there any reason why we would explicitely want to do this? > > I can't remember there was a particular reason. Obviously the driver/timeline > name vfunc access I needed a dma_fence_access_begin/end() block so maybe I > was just sloppy and put the end at the end of the function instead of at the > end of the block which can dereference them.
Yeah that's the next topic I would rather like to improve. We are kind of hiding that the returned strings are using RCU protection. In other words it would be nicer if we could add an __rcu tag to the get_driver_name/get_timeline_name callbacks and let the automated tools complain if somebody isn't doing the proper RCU handling. The problem is that as far as I know that is not supported by the automated tools (would be cool if somebody could double check that). +We would need to convert the get_timeline/get_timeline_name function to something like func(struct dma_fence *fence, const char __rcu **out) to make that work. Regards, Christian. > > I will pull it earlier for the next respin, assuming no gotchas get > discovered in the process. > > Regards, > > Tvrtko > >> >> Regards, >> Christian. >> >>> >>> >>> Regards, >>> >>> Tvrtko >>> >>>> >>>> Regards, >>>> Christian. >>>> >>>>> >>>>> Regards, >>>>> >>>>> Tvrtko >>>>> >>>>>>>> On 5/15/25 11:49, Tvrtko Ursulin wrote: >>>>>>>>> With the goal of reducing the need for drivers to touch (and >>>>>>>>> dereference) >>>>>>>>> fence->ops, we move the 64-bit seqnos flag from struct dma_fence_ops >>>>>>>>> to >>>>>>>>> the fence->flags. >>>>>>>>> >>>>>>>>> Drivers which were setting this flag are changed to use new >>>>>>>>> dma_fence_init64() instead of dma_fence_init(). >>>>>>>>> >>>>>>>>> v2: >>>>>>>>> * Streamlined init and added kerneldoc. >>>>>>>>> * Rebase for amdgpu userq which landed since. >>>>>>>>> >>>>>>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursu...@igalia.com> >>>>>>>>> Reviewed-by: Christian König <christian.koe...@amd.com> # v1 >>>>>>>>> --- >>>>>>>>> drivers/dma-buf/dma-fence-chain.c | 5 +- >>>>>>>>> drivers/dma-buf/dma-fence.c | 69 >>>>>>>>> ++++++++++++++----- >>>>>>>>> .../drm/amd/amdgpu/amdgpu_eviction_fence.c | 7 +- >>>>>>>>> .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.c | 5 +- >>>>>>>>> .../gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c | 5 +- >>>>>>>>> include/linux/dma-fence.h | 14 ++-- >>>>>>>>> 6 files changed, 64 insertions(+), 41 deletions(-) >>>>>>>>> >>>>>>>>> diff --git a/drivers/dma-buf/dma-fence-chain.c >>>>>>>>> b/drivers/dma-buf/dma-fence-chain.c >>>>>>>>> index 90424f23fd73..a8a90acf4f34 100644 >>>>>>>>> --- a/drivers/dma-buf/dma-fence-chain.c >>>>>>>>> +++ b/drivers/dma-buf/dma-fence-chain.c >>>>>>>>> @@ -218,7 +218,6 @@ static void dma_fence_chain_set_deadline(struct >>>>>>>>> dma_fence *fence, >>>>>>>>> } >>>>>>>>> const struct dma_fence_ops dma_fence_chain_ops = { >>>>>>>>> - .use_64bit_seqno = true, >>>>>>>>> .get_driver_name = dma_fence_chain_get_driver_name, >>>>>>>>> .get_timeline_name = dma_fence_chain_get_timeline_name, >>>>>>>>> .enable_signaling = dma_fence_chain_enable_signaling, >>>>>>>>> @@ -262,8 +261,8 @@ void dma_fence_chain_init(struct dma_fence_chain >>>>>>>>> *chain, >>>>>>>>> seqno = max(prev->seqno, seqno); >>>>>>>>> } >>>>>>>>> - dma_fence_init(&chain->base, &dma_fence_chain_ops, >>>>>>>>> - &chain->lock, context, seqno); >>>>>>>>> + dma_fence_init64(&chain->base, &dma_fence_chain_ops, >>>>>>>>> &chain->lock, >>>>>>>>> + context, seqno); >>>>>>>>> /* >>>>>>>>> * Chaining dma_fence_chain container together is only >>>>>>>>> allowed through >>>>>>>>> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c >>>>>>>>> index f0cdd3e99d36..705b59787731 100644 >>>>>>>>> --- a/drivers/dma-buf/dma-fence.c >>>>>>>>> +++ b/drivers/dma-buf/dma-fence.c >>>>>>>>> @@ -989,24 +989,9 @@ void dma_fence_describe(struct dma_fence *fence, >>>>>>>>> struct seq_file *seq) >>>>>>>>> } >>>>>>>>> EXPORT_SYMBOL(dma_fence_describe); >>>>>>>>> -/** >>>>>>>>> - * dma_fence_init - Initialize a custom fence. >>>>>>>>> - * @fence: the fence to initialize >>>>>>>>> - * @ops: the dma_fence_ops for operations on this fence >>>>>>>>> - * @lock: the irqsafe spinlock to use for locking this fence >>>>>>>>> - * @context: the execution context this fence is run on >>>>>>>>> - * @seqno: a linear increasing sequence number for this context >>>>>>>>> - * >>>>>>>>> - * Initializes an allocated fence, the caller doesn't have to keep >>>>>>>>> its >>>>>>>>> - * refcount after committing with this fence, but it will need to >>>>>>>>> hold a >>>>>>>>> - * refcount again if &dma_fence_ops.enable_signaling gets called. >>>>>>>>> - * >>>>>>>>> - * context and seqno are used for easy comparison between fences, >>>>>>>>> allowing >>>>>>>>> - * to check which fence is later by simply using dma_fence_later(). >>>>>>>>> - */ >>>>>>>>> -void >>>>>>>>> -dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops >>>>>>>>> *ops, >>>>>>>>> - spinlock_t *lock, u64 context, u64 seqno) >>>>>>>>> +static void >>>>>>>>> +__dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops >>>>>>>>> *ops, >>>>>>>>> + spinlock_t *lock, u64 context, u64 seqno, unsigned long >>>>>>>>> flags) >>>>>>>>> { >>>>>>>>> BUG_ON(!lock); >>>>>>>>> BUG_ON(!ops || !ops->get_driver_name || >>>>>>>>> !ops->get_timeline_name); >>>>>>>>> @@ -1017,9 +1002,55 @@ dma_fence_init(struct dma_fence *fence, const >>>>>>>>> struct dma_fence_ops *ops, >>>>>>>>> fence->lock = lock; >>>>>>>>> fence->context = context; >>>>>>>>> fence->seqno = seqno; >>>>>>>>> - fence->flags = 0UL; >>>>>>>>> + fence->flags = flags; >>>>>>>>> fence->error = 0; >>>>>>>>> trace_dma_fence_init(fence); >>>>>>>>> } >>>>>>>>> + >>>>>>>>> +/** >>>>>>>>> + * dma_fence_init - Initialize a custom fence. >>>>>>>>> + * @fence: the fence to initialize >>>>>>>>> + * @ops: the dma_fence_ops for operations on this fence >>>>>>>>> + * @lock: the irqsafe spinlock to use for locking this fence >>>>>>>>> + * @context: the execution context this fence is run on >>>>>>>>> + * @seqno: a linear increasing sequence number for this context >>>>>>>>> + * >>>>>>>>> + * Initializes an allocated fence, the caller doesn't have to keep >>>>>>>>> its >>>>>>>>> + * refcount after committing with this fence, but it will need to >>>>>>>>> hold a >>>>>>>>> + * refcount again if &dma_fence_ops.enable_signaling gets called. >>>>>>>>> + * >>>>>>>>> + * context and seqno are used for easy comparison between fences, >>>>>>>>> allowing >>>>>>>>> + * to check which fence is later by simply using dma_fence_later(). >>>>>>>>> + */ >>>>>>>>> +void >>>>>>>>> +dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops >>>>>>>>> *ops, >>>>>>>>> + spinlock_t *lock, u64 context, u64 seqno) >>>>>>>>> +{ >>>>>>>>> + __dma_fence_init(fence, ops, lock, context, seqno, 0UL); >>>>>>>>> +} >>>>>>>>> EXPORT_SYMBOL(dma_fence_init); >>>>>>>>> + >>>>>>>>> +/** >>>>>>>>> + * dma_fence_init64 - Initialize a custom fence with 64-bit seqno >>>>>>>>> support. >>>>>>>>> + * @fence: the fence to initialize >>>>>>>>> + * @ops: the dma_fence_ops for operations on this fence >>>>>>>>> + * @lock: the irqsafe spinlock to use for locking this fence >>>>>>>>> + * @context: the execution context this fence is run on >>>>>>>>> + * @seqno: a linear increasing sequence number for this context >>>>>>>>> + * >>>>>>>>> + * Initializes an allocated fence, the caller doesn't have to keep >>>>>>>>> its >>>>>>>>> + * refcount after committing with this fence, but it will need to >>>>>>>>> hold a >>>>>>>>> + * refcount again if &dma_fence_ops.enable_signaling gets called. >>>>>>>>> + * >>>>>>>>> + * Context and seqno are used for easy comparison between fences, >>>>>>>>> allowing >>>>>>>>> + * to check which fence is later by simply using dma_fence_later(). >>>>>>>>> + */ >>>>>>>>> +void >>>>>>>>> +dma_fence_init64(struct dma_fence *fence, const struct dma_fence_ops >>>>>>>>> *ops, >>>>>>>>> + spinlock_t *lock, u64 context, u64 seqno) >>>>>>>>> +{ >>>>>>>>> + __dma_fence_init(fence, ops, lock, context, seqno, >>>>>>>>> + BIT(DMA_FENCE_FLAG_SEQNO64_BIT)); >>>>>>>>> +} >>>>>>>>> +EXPORT_SYMBOL(dma_fence_init64); >>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c >>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c >>>>>>>>> index 1a7469543db5..79713421bffe 100644 >>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c >>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c >>>>>>>>> @@ -134,7 +134,6 @@ static bool >>>>>>>>> amdgpu_eviction_fence_enable_signaling(struct dma_fence *f) >>>>>>>>> } >>>>>>>>> static const struct dma_fence_ops amdgpu_eviction_fence_ops = { >>>>>>>>> - .use_64bit_seqno = true, >>>>>>>>> .get_driver_name = amdgpu_eviction_fence_get_driver_name, >>>>>>>>> .get_timeline_name = amdgpu_eviction_fence_get_timeline_name, >>>>>>>>> .enable_signaling = amdgpu_eviction_fence_enable_signaling, >>>>>>>>> @@ -160,9 +159,9 @@ amdgpu_eviction_fence_create(struct >>>>>>>>> amdgpu_eviction_fence_mgr *evf_mgr) >>>>>>>>> ev_fence->evf_mgr = evf_mgr; >>>>>>>>> get_task_comm(ev_fence->timeline_name, current); >>>>>>>>> spin_lock_init(&ev_fence->lock); >>>>>>>>> - dma_fence_init(&ev_fence->base, &amdgpu_eviction_fence_ops, >>>>>>>>> - &ev_fence->lock, evf_mgr->ev_fence_ctx, >>>>>>>>> - atomic_inc_return(&evf_mgr->ev_fence_seq)); >>>>>>>>> + dma_fence_init64(&ev_fence->base, &amdgpu_eviction_fence_ops, >>>>>>>>> + &ev_fence->lock, evf_mgr->ev_fence_ctx, >>>>>>>>> + atomic_inc_return(&evf_mgr->ev_fence_seq)); >>>>>>>>> return ev_fence; >>>>>>>>> } >>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c >>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c >>>>>>>>> index 029cb24c28b3..5e92d00a591f 100644 >>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c >>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c >>>>>>>>> @@ -239,8 +239,8 @@ static int amdgpu_userq_fence_create(struct >>>>>>>>> amdgpu_usermode_queue *userq, >>>>>>>>> fence = &userq_fence->base; >>>>>>>>> userq_fence->fence_drv = fence_drv; >>>>>>>>> - dma_fence_init(fence, &amdgpu_userq_fence_ops, >>>>>>>>> &userq_fence->lock, >>>>>>>>> - fence_drv->context, seq); >>>>>>>>> + dma_fence_init64(fence, &amdgpu_userq_fence_ops, >>>>>>>>> &userq_fence->lock, >>>>>>>>> + fence_drv->context, seq); >>>>>>>>> amdgpu_userq_fence_driver_get(fence_drv); >>>>>>>>> dma_fence_get(fence); >>>>>>>>> @@ -334,7 +334,6 @@ static void amdgpu_userq_fence_release(struct >>>>>>>>> dma_fence *f) >>>>>>>>> } >>>>>>>>> static const struct dma_fence_ops amdgpu_userq_fence_ops = { >>>>>>>>> - .use_64bit_seqno = true, >>>>>>>>> .get_driver_name = amdgpu_userq_fence_get_driver_name, >>>>>>>>> .get_timeline_name = amdgpu_userq_fence_get_timeline_name, >>>>>>>>> .signaled = amdgpu_userq_fence_signaled, >>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c >>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c >>>>>>>>> index 51cddfa3f1e8..5d26797356a3 100644 >>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c >>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c >>>>>>>>> @@ -71,7 +71,6 @@ static void amdgpu_tlb_fence_work(struct >>>>>>>>> work_struct *work) >>>>>>>>> } >>>>>>>>> static const struct dma_fence_ops amdgpu_tlb_fence_ops = { >>>>>>>>> - .use_64bit_seqno = true, >>>>>>>>> .get_driver_name = amdgpu_tlb_fence_get_driver_name, >>>>>>>>> .get_timeline_name = amdgpu_tlb_fence_get_timeline_name >>>>>>>>> }; >>>>>>>>> @@ -101,8 +100,8 @@ void amdgpu_vm_tlb_fence_create(struct >>>>>>>>> amdgpu_device *adev, struct amdgpu_vm *vm >>>>>>>>> INIT_WORK(&f->work, amdgpu_tlb_fence_work); >>>>>>>>> spin_lock_init(&f->lock); >>>>>>>>> - dma_fence_init(&f->base, &amdgpu_tlb_fence_ops, &f->lock, >>>>>>>>> - vm->tlb_fence_context, atomic64_read(&vm->tlb_seq)); >>>>>>>>> + dma_fence_init64(&f->base, &amdgpu_tlb_fence_ops, &f->lock, >>>>>>>>> + vm->tlb_fence_context, atomic64_read(&vm->tlb_seq)); >>>>>>>>> /* TODO: We probably need a separate wq here */ >>>>>>>>> dma_fence_get(&f->base); >>>>>>>>> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h >>>>>>>>> index 48b5202c531d..a34a0dcdc446 100644 >>>>>>>>> --- a/include/linux/dma-fence.h >>>>>>>>> +++ b/include/linux/dma-fence.h >>>>>>>>> @@ -97,6 +97,7 @@ struct dma_fence { >>>>>>>>> }; >>>>>>>>> enum dma_fence_flag_bits { >>>>>>>>> + DMA_FENCE_FLAG_SEQNO64_BIT, >>>>>>>>> DMA_FENCE_FLAG_SIGNALED_BIT, >>>>>>>>> DMA_FENCE_FLAG_TIMESTAMP_BIT, >>>>>>>>> DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, >>>>>>>>> @@ -124,14 +125,6 @@ struct dma_fence_cb { >>>>>>>>> * >>>>>>>>> */ >>>>>>>>> struct dma_fence_ops { >>>>>>>>> - /** >>>>>>>>> - * @use_64bit_seqno: >>>>>>>>> - * >>>>>>>>> - * True if this dma_fence implementation uses 64bit seqno, false >>>>>>>>> - * otherwise. >>>>>>>>> - */ >>>>>>>>> - bool use_64bit_seqno; >>>>>>>>> - >>>>>>>>> /** >>>>>>>>> * @get_driver_name: >>>>>>>>> * >>>>>>>>> @@ -262,6 +255,9 @@ struct dma_fence_ops { >>>>>>>>> void dma_fence_init(struct dma_fence *fence, const struct >>>>>>>>> dma_fence_ops *ops, >>>>>>>>> spinlock_t *lock, u64 context, u64 seqno); >>>>>>>>> +void dma_fence_init64(struct dma_fence *fence, const struct >>>>>>>>> dma_fence_ops *ops, >>>>>>>>> + spinlock_t *lock, u64 context, u64 seqno); >>>>>>>>> + >>>>>>>>> void dma_fence_release(struct kref *kref); >>>>>>>>> void dma_fence_free(struct dma_fence *fence); >>>>>>>>> void dma_fence_describe(struct dma_fence *fence, struct seq_file >>>>>>>>> *seq); >>>>>>>>> @@ -454,7 +450,7 @@ static inline bool __dma_fence_is_later(struct >>>>>>>>> dma_fence *fence, u64 f1, u64 f2) >>>>>>>>> * 32bit sequence numbers. Use a 64bit compare when the >>>>>>>>> driver says to >>>>>>>>> * do so. >>>>>>>>> */ >>>>>>>>> - if (fence->ops->use_64bit_seqno) >>>>>>>>> + if (test_bit(DMA_FENCE_FLAG_SEQNO64_BIT, &fence->flags)) >>>>>>>>> return f1 > f2; >>>>>>>>> return (int)(lower_32_bits(f1) - lower_32_bits(f2)) > 0; >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >