On Tue, 2026-03-31 at 12:13 +0200, Thomas Hellström wrote: > On Tue, 2026-03-31 at 11:44 +0200, Christian König wrote: > > On 3/31/26 11:20, Thomas Hellström wrote: > > > The xe driver was using the drm_exec retry pointer directly to > > > restart the locking loop after out-of-memory errors. This is > > > relying on documented behaviour. > > > > > > Instead add a drm_exec_retry() macro that can be used in this > > > situation, and that also asserts that the struct drm_exec is > > > in a state that is compatible with retrying: > > > Either newly initialized or in a contended state with all locks > > > dropped. > > > > > > Use that macro in xe. > > > > > > Signed-off-by: Thomas Hellström > > > <[email protected]> > > > --- > > > drivers/gpu/drm/xe/xe_validation.h | 2 +- > > > include/drm/drm_exec.h | 13 +++++++++++++ > > > 2 files changed, 14 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/gpu/drm/xe/xe_validation.h > > > b/drivers/gpu/drm/xe/xe_validation.h > > > index a30e732c4d51..4cd955ce6cd2 100644 > > > --- a/drivers/gpu/drm/xe/xe_validation.h > > > +++ b/drivers/gpu/drm/xe/xe_validation.h > > > @@ -146,7 +146,7 @@ bool xe_validation_should_retry(struct > > > xe_validation_ctx *ctx, int *ret); > > > #define xe_validation_retry_on_oom(_ctx, > > > _ret) \ > > > do > > > { \ > > > if (xe_validation_should_retry(_ctx, > > > _ret)) \ > > > - goto > > > *__drm_exec_retry_ptr; \ > > > + drm_exec_retry((_ctx)- > > > > exec); \ > > > > Oh, that goto is extremely questionable to begin with. > > > > > } while (0) > > > > > > /** > > > diff --git a/include/drm/drm_exec.h b/include/drm/drm_exec.h > > > index fc95a979e253..5ed5be1f8244 100644 > > > --- a/include/drm/drm_exec.h > > > +++ b/include/drm/drm_exec.h > > > @@ -138,6 +138,19 @@ static inline bool > > > drm_exec_is_contended(struct drm_exec *exec) > > > return !!exec->contended; > > > } > > > > > > +/** > > > + * drm_exec_retry() - Unconditionally restart the loop to grab > > > all > > > locks. > > > + * @exec: drm_exec object > > > + * > > > + * Unconditionally retry the loop to lock all objects. For > > > consistency, > > > + * the exec object needs to be newly initialized or contended. > > > + */ > > > +#define drm_exec_retry(_exec) \ > > > + do { \ > > > + WARN_ON(!drm_exec_is_contended(_exec)); \ > > > > This warning would trigger! > > > > See the code in xe_bo_notifier_prepare_pinned() for example: > > > > drm_exec_retry_on_contention(&exec); > > ret = PTR_ERR(backup); > > xe_validation_retry_on_oom(&ctx, &ret); > > > > Without contention we would just skip the loop and never lock > > anything. > > > > What XE does here just doesn't work as far as I can see. > > So if the xe_validation_retry_on_oom() is actually retrying it > internally call drm_exec_fini() and drm_exec_init() first, which > means > that the warning doesn't trigger, due to the dummy value of > contended. > > So the warning does its job, and xe is safe.
So the xe stuff is actually basically an outer loop to drm_exec_until_all_locked(). We could ofc explicitly code that implementing an xe_validation_until_all_valid() and have a separate goto ptr, but I'm not sure that is cleaner, really. They'd point to the same address anyway. In the end, the WARN_ON in drm_exec_retry() would ensure drm_exec is not in an awkward state anyway. Thanks, Thomas > > Thanks, > Thomas > > > > > > > Regards, > > Christian. > > > > > + goto *__drm_exec_retry_ptr; \ > > > + } while (0) > > > + > > > void drm_exec_init(struct drm_exec *exec, u32 flags, unsigned > > > nr); > > > void drm_exec_fini(struct drm_exec *exec); > > > bool drm_exec_cleanup(struct drm_exec *exec);
