Hey, Den 2025-09-29 kl. 14:42, skrev Janusz Krzysztofik: > Hi Christian, > > Thank you for looking at it and providing your R-b. > > On Sunday, 28 September 2025 16:00:57 CEST Christian König wrote: >> On 26.09.25 17:26, Janusz Krzysztofik wrote: >>> A timer that expires a vgem fence automatically in 10 seconds is now >>> released with timer_delete_sync() from fence->ops.release() called on last >>> dma_fence_put(). In some scenarios, it can run in IRQ context, which is >>> not safe unless TIMER_IRQSAFE is used. One potentially risky scenario was >>> demonstrated in Intel DRM CI trybot, BAT run on machine bat-adlp-6, while >>> working on new IGT subtests syncobj_timeline@stress-* as user space >>> replacements of some problematic test cases of a dma-fence-chain selftest > ... >>> Make the timer IRQ safe. >>> >>> [1] https://patchwork.freedesktop.org/series/154987/#rev2 >>> >>> Fixes: 4077798484459 ("drm/vgem: Attach sw fences to exported vGEM dma-buf >>> (ioctl)") >>> Signed-off-by: Janusz Krzysztofik <[email protected]> >> >> Reviewed-by: Christian König <[email protected]> >> >> We should also consider to lower the timeout massively. This needs to be in >> the range of 100m-1s to not cause the same trouble as the sw_sync framework. > > Assuming you are referring to potential hard lockups observed in sw_sync > based > version of the exercise, which piece of kernel code you would expect to loop > for too long with interrupts disabled due to the vgem fences auto expiry > timeout of 10s? > >> >> 10seconds is just way to long for that. > > Do you think DRM objects can't be busy for that long in real life? What > about > long running GPU compute tasks? > > Thanks, > Janusz > > >> >> Regards, >> Christian. >> >>> --- >>> drivers/gpu/drm/vgem/vgem_fence.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/drivers/gpu/drm/vgem/vgem_fence.c >>> b/drivers/gpu/drm/vgem/vgem_fence.c >>> index fd76730fd38c0..07db319c3d7f9 100644 >>> --- a/drivers/gpu/drm/vgem/vgem_fence.c >>> +++ b/drivers/gpu/drm/vgem/vgem_fence.c >>> @@ -79,7 +79,7 @@ static struct dma_fence *vgem_fence_create(struct >>> vgem_file *vfile, >>> dma_fence_init(&fence->base, &vgem_fence_ops, &fence->lock, >>> dma_fence_context_alloc(1), 1); >>> >>> - timer_setup(&fence->timer, vgem_fence_timeout, 0); >>> + timer_setup(&fence->timer, vgem_fence_timeout, TIMER_IRQSAFE); >>> >>> /* We force the fence to expire within 10s to prevent driver hangs */ >>> mod_timer(&fence->timer, jiffies + VGEM_FENCE_TIMEOUT); >> >> > > > > Pushed the ptach, thanks.
I'm imagining it's mostly because of potential chaining of multiple fences, which can worsen the worst-case timeout easily. Kind regards, ~Maarten
