On 3/31/26 12:13, Thomas Hellström wrote: > On Tue, 2026-03-31 at 11:44 +0200, Christian König wrote: >> On 3/31/26 11:20, Thomas Hellström wrote: >>> The xe driver was using the drm_exec retry pointer directly to >>> restart the locking loop after out-of-memory errors. This is >>> relying on documented behaviour. >>> >>> Instead add a drm_exec_retry() macro that can be used in this >>> situation, and that also asserts that the struct drm_exec is >>> in a state that is compatible with retrying: >>> Either newly initialized or in a contended state with all locks >>> dropped. >>> >>> Use that macro in xe. >>> >>> Signed-off-by: Thomas Hellström <[email protected]> >>> --- >>> drivers/gpu/drm/xe/xe_validation.h | 2 +- >>> include/drm/drm_exec.h | 13 +++++++++++++ >>> 2 files changed, 14 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/gpu/drm/xe/xe_validation.h >>> b/drivers/gpu/drm/xe/xe_validation.h >>> index a30e732c4d51..4cd955ce6cd2 100644 >>> --- a/drivers/gpu/drm/xe/xe_validation.h >>> +++ b/drivers/gpu/drm/xe/xe_validation.h >>> @@ -146,7 +146,7 @@ bool xe_validation_should_retry(struct >>> xe_validation_ctx *ctx, int *ret); >>> #define xe_validation_retry_on_oom(_ctx, >>> _ret) \ >>> do >>> { \ >>> if (xe_validation_should_retry(_ctx, >>> _ret)) \ >>> - goto >>> *__drm_exec_retry_ptr; \ >>> + drm_exec_retry((_ctx)- >>>> exec); \ >> >> Oh, that goto is extremely questionable to begin with. >> >>> } while (0) >>> >>> /** >>> diff --git a/include/drm/drm_exec.h b/include/drm/drm_exec.h >>> index fc95a979e253..5ed5be1f8244 100644 >>> --- a/include/drm/drm_exec.h >>> +++ b/include/drm/drm_exec.h >>> @@ -138,6 +138,19 @@ static inline bool >>> drm_exec_is_contended(struct drm_exec *exec) >>> return !!exec->contended; >>> } >>> >>> +/** >>> + * drm_exec_retry() - Unconditionally restart the loop to grab all >>> locks. >>> + * @exec: drm_exec object >>> + * >>> + * Unconditionally retry the loop to lock all objects. For >>> consistency, >>> + * the exec object needs to be newly initialized or contended. >>> + */ >>> +#define drm_exec_retry(_exec) \ >>> + do { \ >>> + WARN_ON(!drm_exec_is_contended(_exec)); \ >> >> This warning would trigger! >> >> See the code in xe_bo_notifier_prepare_pinned() for example: >> >> drm_exec_retry_on_contention(&exec); >> ret = PTR_ERR(backup); >> xe_validation_retry_on_oom(&ctx, &ret); >> >> Without contention we would just skip the loop and never lock >> anything. >> >> What XE does here just doesn't work as far as I can see. > > So if the xe_validation_retry_on_oom() is actually retrying it > internally call drm_exec_fini() and drm_exec_init() first, which means > that the warning doesn't trigger, due to the dummy value of contended.
Ah! Yeah that information was missing. I'm really wondering if the calls to drm_exec_fini()/drm_exec_init() should be part of the drm_exec_retry() handling. Otherwise that is kind of easy to mess up. Regards, Christian. > > So the warning does its job, and xe is safe. > > Thanks, > Thomas > > > >> >> Regards, >> Christian. >> >>> + goto *__drm_exec_retry_ptr; \ >>> + } while (0) >>> + >>> void drm_exec_init(struct drm_exec *exec, u32 flags, unsigned nr); >>> void drm_exec_fini(struct drm_exec *exec); >>> bool drm_exec_cleanup(struct drm_exec *exec);
