On Fri, May 9, 2025 at 8:34 AM Tvrtko Ursulin <tvrtko.ursu...@igalia.com> wrote: > > Dma-fence objects currently suffer from a potential use after free problem > where fences exported to userspace and other drivers can outlive the > exporting driver, or the associated data structures. > > The discussion on how to address this concluded that adding reference > counting to all the involved objects is not desirable, since it would need > to be very wide reaching and could cause unloadable drivers if another > entity would be holding onto a signaled fence reference potentially > indefinitely. > > This patch enables the safe access by introducing and documenting a > contract between fence exporters and users. It documents a set of > contraints and adds helpers which a) drivers with potential to suffer from > the use after free must use and b) users of the dma-fence API must use as > well. > > Premise of the design has multiple sides: > > 1. Drivers (fence exporters) MUST ensure a RCU grace period between > signalling a fence and freeing the driver private data associated with it. > > The grace period does not have to follow the signalling immediately but > HAS to happen before data is freed. > > 2. Users of the dma-fence API marked with such requirement MUST contain > the complete access to the data within a single code block guarded by the > new dma_fence_access_begin() and dma_fence_access_end() helpers. > > The combination of the two ensures that whoever sees the > DMA_FENCE_FLAG_SIGNALED_BIT not set is guaranteed to have access to a > valid fence->lock and valid data potentially accessed by the fence->ops > virtual functions, until the call to dma_fence_access_end(). > > 3. Module unload (fence->ops) disappearing is for now explicitly not > handled. That would required a more complex protection, possibly needing > SRCU instead of RCU to handle callers such as dma_fence_wait_timeout(), > where race between dma_fence_enable_sw_signaling, signalling, and > dereference of fence->ops->wait() would need a sleeping SRCU context. > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursu...@igalia.com> > --- > drivers/dma-buf/dma-fence.c | 69 +++++++++++++++++++++++++++++++++++++ > include/linux/dma-fence.h | 32 ++++++++++++----- > 2 files changed, 93 insertions(+), 8 deletions(-) > > diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c > index dc2456f68685..cfe1d7b79c22 100644 > --- a/drivers/dma-buf/dma-fence.c > +++ b/drivers/dma-buf/dma-fence.c > @@ -533,6 +533,7 @@ void dma_fence_release(struct kref *kref) > struct dma_fence *fence = > container_of(kref, struct dma_fence, refcount); > > + dma_fence_access_begin(); > trace_dma_fence_destroy(fence); > > if (WARN(!list_empty(&fence->cb_list) && > @@ -560,6 +561,8 @@ void dma_fence_release(struct kref *kref) > fence->ops->release(fence); > else > dma_fence_free(fence); > + > + dma_fence_access_end(); > } > EXPORT_SYMBOL(dma_fence_release); > > @@ -982,11 +985,13 @@ EXPORT_SYMBOL(dma_fence_set_deadline); > */ > void dma_fence_describe(struct dma_fence *fence, struct seq_file *seq) > { > + dma_fence_access_begin(); > seq_printf(seq, "%s %s seq %llu %ssignalled\n", > dma_fence_driver_name(fence), > dma_fence_timeline_name(fence), > fence->seqno, > dma_fence_is_signaled(fence) ? "" : "un"); > + dma_fence_access_end(); > } > EXPORT_SYMBOL(dma_fence_describe); > > @@ -1033,3 +1038,67 @@ dma_fence_init64(struct dma_fence *fence, const struct > dma_fence_ops *ops, > __set_bit(DMA_FENCE_FLAG_SEQNO64_BIT, &fence->flags); > } > EXPORT_SYMBOL(dma_fence_init64); > + > +/** > + * dma_fence_driver_name - Access the driver name > + * @fence: the fence to query > + * > + * Returns a driver name backing the dma-fence implementation. > + * > + * IMPORTANT CONSIDERATION: > + * Dma-fence contract stipulates that access to driver provided data (data > not > + * directly embedded into the object itself), such as the &dma_fence.lock and > + * memory potentially accessed by the &dma_fence.ops functions, is forbidden > + * after the fence has been signalled. Drivers are allowed to free that data, > + * and some do. > + * > + * To allow safe access drivers are mandated to guarantee a RCU grace period > + * between signalling the fence and freeing said data. > + * > + * As such access to the driver name is only valid inside a RCU locked > section. > + * The pointer MUST be both queried and USED ONLY WITHIN a SINGLE block > guarded > + * by the &dma_fence_access_being and &dma_fence_access_end pair. > + */ > +const char *dma_fence_driver_name(struct dma_fence *fence) > +{ > + RCU_LOCKDEP_WARN(!rcu_read_lock_held(), > + "rcu_read_lock() required for safe access to > returned string"); > + > + if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)) > + return fence->ops->get_driver_name(fence); > + else > + return "detached-driver"; > +} > +EXPORT_SYMBOL(dma_fence_driver_name); > + > +/** > + * dma_fence_timeline_name - Access the timeline name > + * @fence: the fence to query > + * > + * Returns a timeline name provided by the dma-fence implementation. > + * > + * IMPORTANT CONSIDERATION: > + * Dma-fence contract stipulates that access to driver provided data (data > not > + * directly embedded into the object itself), such as the &dma_fence.lock and > + * memory potentially accessed by the &dma_fence.ops functions, is forbidden > + * after the fence has been signalled. Drivers are allowed to free that data, > + * and some do. > + * > + * To allow safe access drivers are mandated to guarantee a RCU grace period > + * between signalling the fence and freeing said data. > + * > + * As such access to the driver name is only valid inside a RCU locked > section. > + * The pointer MUST be both queried and USED ONLY WITHIN a SINGLE block > guarded > + * by the &dma_fence_access_being and &dma_fence_access_end pair. > + */ > +const char *dma_fence_timeline_name(struct dma_fence *fence) > +{ > + RCU_LOCKDEP_WARN(!rcu_read_lock_held(), > + "rcu_read_lock() required for safe access to > returned string"); > + > + if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)) > + return fence->ops->get_driver_name(fence); > + else > + return "signaled-timeline";
This means that trace_dma_fence_signaled() will get the wrong timeline/driver name, which probably screws up perfetto and maybe other tools. Maybe it would work well enough just to move the trace_dma_fence_signaled() call ahead of the test_and_set_bit()? Idk if some things will start getting confused if they see that trace multiple times. Maybe a better solution would be to add a new bit for this purpose, which is set after the tracepoint in dma_fence_signal_timestamp_locked(). (trace_dma_fence_destroy() will I guess be messed up regardless.. it doesn't look like perfetto cares about this tracepoint, so maybe that is ok. It doesn't seem so useful.) BR, -R > +} > +EXPORT_SYMBOL(dma_fence_timeline_name); > diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h > index c814a86087f8..c8a9915eb360 100644 > --- a/include/linux/dma-fence.h > +++ b/include/linux/dma-fence.h > @@ -387,15 +387,31 @@ bool dma_fence_remove_callback(struct dma_fence *fence, > struct dma_fence_cb *cb); > void dma_fence_enable_sw_signaling(struct dma_fence *fence); > > -static inline const char *dma_fence_driver_name(struct dma_fence *fence) > -{ > - return fence->ops->get_driver_name(fence); > -} > +/** > + * DOC: Safe external access to driver provided object members > + * > + * All data not stored directly in the dma-fence object, such as the > + * &dma_fence.lock and memory potentially accessed by functions in the > + * &dma_fence.ops table, MUST NOT be accessed after the fence has been > signalled > + * because after that point drivers are allowed to free it. > + * > + * All code accessing that data via the dma-fence API (or directly, which is > + * discouraged), MUST make sure to contain the complete access within a > + * &dma_fence_access_begin and &dma_fence_access_end pair. > + * > + * Some dma-fence API handles this automatically, while other, as for example > + * &dma_fence_driver_name and &dma_fence_timeline_name, leave that > + * responsibility to the caller. > + * > + * To enable this scheme to work drivers MUST ensure a RCU grace period > elapses > + * between signalling the fence and freeing the said data. > + * > + */ > +#define dma_fence_access_begin rcu_read_lock > +#define dma_fence_access_end rcu_read_unlock > > -static inline const char *dma_fence_timeline_name(struct dma_fence *fence) > -{ > - return fence->ops->get_timeline_name(fence); > -} > +const char *dma_fence_driver_name(struct dma_fence *fence); > +const char *dma_fence_timeline_name(struct dma_fence *fence); > > /** > * dma_fence_is_signaled_locked - Return an indication if the fence > -- > 2.48.0 >