Applied. Thanks! Alex
On Tue, Mar 31, 2026 at 10:29 AM Christian König <[email protected]> wrote: > > > > On 3/31/26 16:21, Mikhail Gavrilov wrote: > > Replace the PASID IDR + spinlock with XArray as noted in the TODO > > left by commit dccd79bb1c7f ("drm/amdgpu: fix the idr allocation > > flags"). > > > > The IDR conversion still has an IRQ safety issue: > > amdgpu_pasid_free() can be called from hardirq context via the fence > > signal path, but amdgpu_pasid_idr_lock is taken with plain spin_lock() > > in process context, creating a potential deadlock: > > > > CPU0 > > ---- > > spin_lock(&amdgpu_pasid_idr_lock) // process context, IRQs on > > <Interrupt> > > spin_lock(&amdgpu_pasid_idr_lock) // deadlock > > > > The hardirq call chain is: > > > > sdma_v6_0_process_trap_irq > > -> amdgpu_fence_process > > -> dma_fence_signal > > -> drm_sched_job_done > > -> dma_fence_signal > > -> amdgpu_pasid_free_cb > > -> amdgpu_pasid_free > > > > Use XArray with XA_FLAGS_LOCK_IRQ (all xa operations use IRQ-safe > > locking internally) and XA_FLAGS_ALLOC1 (zero is not a valid PASID). > > Both xa_alloc_cyclic() and xa_erase() then handle locking > > consistently, fixing the IRQ safety issue and removing the need for > > an explicit spinlock. > > > > Suggested-by: Lijo Lazar <[email protected]> > > Fixes: e6d765de3d6b ("drm/amdgpu: prevent immediate PASID reuse case") > > Signed-off-by: Mikhail Gavrilov <[email protected]> > > Reviewed-by: Christian König <[email protected]> > > > --- > > > > v7: Rebased on amd-staging-drm-next which already includes > > dccd79bb1c7f ("drm/amdgpu: fix the idr allocation flags"). > > Updated commit message to reflect that sleeping-under-spinlock > > is already fixed and the xarray conversion now addresses the > > remaining IRQ safety issue. Inverted error check to > > if (r < 0) return r; per Christian König. > > v6: Use DEFINE_XARRAY_FLAGS with XA_FLAGS_LOCK_IRQ | XA_FLAGS_ALLOC1 > > so all xa operations use IRQ-safe locking internally. Drop > > Cc: stable since the regression was never released to any stable > > kernel. (Christian König) > > > > https://lore.kernel.org/all/[email protected]/ > > v5: Use explicit xa_lock_irqsave/__xa_erase for amdgpu_pasid_free() > > since xa_erase() only uses plain xa_lock() which is not safe from > > hardirq context. > > > > https://lore.kernel.org/all/[email protected]/ > > v4: Use xa_alloc_cyclic/xa_erase directly instead of explicit > > xa_lock_irqsave, as suggested by Lijo Lazar. > > > > https://lore.kernel.org/all/[email protected]/ > > v3: Replace IDR with XArray instead of fixing the spinlock, as > > suggested by Lijo Lazar. > > > > https://lore.kernel.org/all/[email protected]/ > > v2: Added second patch fixing the {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} > > lock inconsistency (spin_lock -> spin_lock_irqsave). > > > > https://lore.kernel.org/all/[email protected]/ > > v1: Fixed sleeping-under-spinlock (idr_alloc_cyclic with GFP_KERNEL) > > using idr_preload/GFP_NOWAIT. > > > > https://lore.kernel.org/all/[email protected]/ > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c | 34 ++++++++++--------------- > > 1 file changed, 13 insertions(+), 21 deletions(-) > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c > > index e495a8fa13fd..a6ac3b4ce0df 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c > > @@ -22,7 +22,7 @@ > > */ > > #include "amdgpu_ids.h" > > > > -#include <linux/idr.h> > > +#include <linux/xarray.h> > > #include <linux/dma-fence-array.h> > > > > > > @@ -40,8 +40,8 @@ > > * VMs are looked up from the PASID per amdgpu_device. > > */ > > > > -static DEFINE_IDR(amdgpu_pasid_idr); > > -static DEFINE_SPINLOCK(amdgpu_pasid_idr_lock); > > +static DEFINE_XARRAY_FLAGS(amdgpu_pasid_xa, XA_FLAGS_LOCK_IRQ | > > XA_FLAGS_ALLOC1); > > +static u32 amdgpu_pasid_xa_next; > > > > /* Helper to free pasid from a fence callback */ > > struct amdgpu_pasid_cb { > > @@ -62,22 +62,19 @@ struct amdgpu_pasid_cb { > > */ > > int amdgpu_pasid_alloc(unsigned int bits) > > { > > - int pasid; > > + u32 pasid; > > + int r; > > > > if (bits == 0) > > return -EINVAL; > > > > - spin_lock(&amdgpu_pasid_idr_lock); > > - /* TODO: Need to replace the idr with an xarry, and then > > - * handle the internal locking with ATOMIC safe paths. > > - */ > > - pasid = idr_alloc_cyclic(&amdgpu_pasid_idr, NULL, 1, > > - 1U << bits, GFP_ATOMIC); > > - spin_unlock(&amdgpu_pasid_idr_lock); > > - > > - if (pasid >= 0) > > - trace_amdgpu_pasid_allocated(pasid); > > + r = xa_alloc_cyclic(&amdgpu_pasid_xa, &pasid, xa_mk_value(0), > > + XA_LIMIT(1, (1U << bits) - 1), > > + &amdgpu_pasid_xa_next, GFP_KERNEL); > > + if (r < 0) > > + return r; > > > > + trace_amdgpu_pasid_allocated(pasid); > > return pasid; > > } > > > > @@ -88,10 +85,7 @@ int amdgpu_pasid_alloc(unsigned int bits) > > void amdgpu_pasid_free(u32 pasid) > > { > > trace_amdgpu_pasid_freed(pasid); > > - > > - spin_lock(&amdgpu_pasid_idr_lock); > > - idr_remove(&amdgpu_pasid_idr, pasid); > > - spin_unlock(&amdgpu_pasid_idr_lock); > > + xa_erase(&amdgpu_pasid_xa, pasid); > > } > > > > static void amdgpu_pasid_free_cb(struct dma_fence *fence, > > @@ -634,7 +628,5 @@ void amdgpu_vmid_mgr_fini(struct amdgpu_device *adev) > > */ > > void amdgpu_pasid_mgr_cleanup(void) > > { > > - spin_lock(&amdgpu_pasid_idr_lock); > > - idr_destroy(&amdgpu_pasid_idr); > > - spin_unlock(&amdgpu_pasid_idr_lock); > > + xa_destroy(&amdgpu_pasid_xa); > > } >
