On Mon, Jul 07, 2025 at 05:18:13PM +0200, Simona Vetter wrote: > Object creation is a careful dance where we must guarantee that the > object is fully constructed before it is visible to other threads, and > GEM buffer objects are no difference. > > Final publishing happens by calling drm_gem_handle_create(). After > that the only allowed thing to do is call drm_gem_object_put() because > a concurrent call to the GEM_CLOSE ioctl with a correctly guessed id > (which is trivial since we have a linear allocator) can already tear > down the object again. > > Luckily most drivers get this right, the very few exceptions I've > pinged the relevant maintainers for. Unfortunately we also need > drm_gem_handle_create() when creating additional handles for an > already existing object (e.g. GETFB ioctl or the various bo import > ioctl), and hence we cannot have a drm_gem_handle_create_and_put() as > the only exported function to stop these issues from happening. > > Now unfortunately the implementation of drm_gem_handle_create() isn't > living up to standards: It does correctly finishe object > initialization at the global level, and hence is safe against a > concurrent tear down. But it also sets up the file-private aspects of > the handle, and that part goes wrong: We fully register the object in > the drm_file.object_idr before calling drm_vma_node_allow() or > obj->funcs->open, which opens up races against concurrent removal of > that handle in drm_gem_handle_delete(). > > Fix this with the usual two-stage approach of first reserving the > handle id, and then only registering the object after we've completed > the file-private setup. > > Jacek reported this with a testcase of concurrently calling GEM_CLOSE > on a freshly-created object (which also destroys the object), but it > should be possible to hit this with just additional handles created > through import or GETFB without completed destroying the underlying > object with the concurrent GEM_CLOSE ioctl calls. > > Note that the close-side of this race was fixed in f6cd7daecff5 ("drm: > Release driver references to handle before making it available > again"), which means a cool 9 years have passed until someone noticed > that we need to make this symmetry or there's still gaps left :-/ > Without the 2-stage close approach we'd still have a race, therefore > that's an integral part of this bugfix. > > More importantly, this means we can have NULL pointers behind > allocated id in our drm_file.object_idr. We need to check for that > now: > > - drm_gem_handle_delete() checks for ERR_OR_NULL already > > - drm_gem.c:object_lookup() also chekcs for NULL > > - drm_gem_release() should never be called if there's another thread > still existing that could call into an IOCTL that creates a new > handle, so cannot race. For paranoia I added a NULL check to > drm_gem_object_release_handle() though. > > - most drivers (etnaviv, i915, msm) are find because they use > idr_find(), which maps both ENOENT and NULL to NULL. > > - drivers using idr_for_each_entry() should also be fine, because > idr_get_next does filter out NULL entries and continues the > iteration. > > - The same holds for drm_show_memory_stats(). > > v2: Use drm_WARN_ON (Thomas) > > Reported-by: Jacek Lawrynowicz <jacek.lawrynow...@linux.intel.com> > Tested-by: Jacek Lawrynowicz <jacek.lawrynow...@linux.intel.com> > Reviewed-by: Thomas Zimmermann <tzimmerm...@suse.de> > Cc: sta...@vger.kernel.org > Cc: Jacek Lawrynowicz <jacek.lawrynow...@linux.intel.com> > Cc: Maarten Lankhorst <maarten.lankho...@linux.intel.com> > Cc: Maxime Ripard <mrip...@kernel.org> > Cc: Thomas Zimmermann <tzimmerm...@suse.de> > Cc: David Airlie <airl...@gmail.com> > Cc: Simona Vetter <sim...@ffwll.ch> > Signed-off-by: Simona Vetter <simona.vet...@intel.com> > Signed-off-by: Simona Vetter <simona.vet...@ffwll.ch>
Pushed to drm-misc-fixes, thanks for the reviews. -Sima > --- > drivers/gpu/drm/drm_gem.c | 10 +++++++++- > include/drm/drm_file.h | 3 +++ > 2 files changed, 12 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c > index bc505d938b3e..1aa9192c4cc6 100644 > --- a/drivers/gpu/drm/drm_gem.c > +++ b/drivers/gpu/drm/drm_gem.c > @@ -316,6 +316,9 @@ drm_gem_object_release_handle(int id, void *ptr, void > *data) > struct drm_file *file_priv = data; > struct drm_gem_object *obj = ptr; > > + if (drm_WARN_ON(obj->dev, !data)) > + return 0; > + > if (obj->funcs->close) > obj->funcs->close(obj, file_priv); > > @@ -436,7 +439,7 @@ drm_gem_handle_create_tail(struct drm_file *file_priv, > idr_preload(GFP_KERNEL); > spin_lock(&file_priv->table_lock); > > - ret = idr_alloc(&file_priv->object_idr, obj, 1, 0, GFP_NOWAIT); > + ret = idr_alloc(&file_priv->object_idr, NULL, 1, 0, GFP_NOWAIT); > > spin_unlock(&file_priv->table_lock); > idr_preload_end(); > @@ -457,6 +460,11 @@ drm_gem_handle_create_tail(struct drm_file *file_priv, > goto err_revoke; > } > > + /* mirrors drm_gem_handle_delete to avoid races */ > + spin_lock(&file_priv->table_lock); > + obj = idr_replace(&file_priv->object_idr, obj, handle); > + WARN_ON(obj != NULL); > + spin_unlock(&file_priv->table_lock); > *handlep = handle; > return 0; > > diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h > index eab7546aad79..115763799625 100644 > --- a/include/drm/drm_file.h > +++ b/include/drm/drm_file.h > @@ -300,6 +300,9 @@ struct drm_file { > * > * Mapping of mm object handles to object pointers. Used by the GEM > * subsystem. Protected by @table_lock. > + * > + * Note that allocated entries might be NULL as a transient state when > + * creating or deleting a handle. > */ > struct idr object_idr; > > -- > 2.49.0 > -- Simona Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch