On 3/31/26 12:13, Thomas Hellström wrote:
> On Tue, 2026-03-31 at 11:44 +0200, Christian König wrote:
>> On 3/31/26 11:20, Thomas Hellström wrote:
>>> The xe driver was using the drm_exec retry pointer directly to
>>> restart the locking loop after out-of-memory errors. This is
>>> relying on documented behaviour.
>>>
>>> Instead add a drm_exec_retry() macro that can be used in this
>>> situation, and that also asserts that the struct drm_exec is
>>> in a state that is compatible with retrying:
>>> Either newly initialized or in a contended state with all locks
>>> dropped.
>>>
>>> Use that macro in xe.
>>>
>>> Signed-off-by: Thomas Hellström <[email protected]>
>>> ---
>>>  drivers/gpu/drm/xe/xe_validation.h |  2 +-
>>>  include/drm/drm_exec.h             | 13 +++++++++++++
>>>  2 files changed, 14 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_validation.h
>>> b/drivers/gpu/drm/xe/xe_validation.h
>>> index a30e732c4d51..4cd955ce6cd2 100644
>>> --- a/drivers/gpu/drm/xe/xe_validation.h
>>> +++ b/drivers/gpu/drm/xe/xe_validation.h
>>> @@ -146,7 +146,7 @@ bool xe_validation_should_retry(struct
>>> xe_validation_ctx *ctx, int *ret);
>>>  #define xe_validation_retry_on_oom(_ctx,
>>> _ret)                               \
>>>     do
>>> {                                                           \
>>>             if (xe_validation_should_retry(_ctx,
>>> _ret))              \
>>> -                   goto
>>> *__drm_exec_retry_ptr;                      \
>>> +                   drm_exec_retry((_ctx)-
>>>> exec);                     \
>>
>> Oh, that goto is extremely questionable to begin with.
>>
>>>     } while (0)
>>>  
>>>  /**
>>> diff --git a/include/drm/drm_exec.h b/include/drm/drm_exec.h
>>> index fc95a979e253..5ed5be1f8244 100644
>>> --- a/include/drm/drm_exec.h
>>> +++ b/include/drm/drm_exec.h
>>> @@ -138,6 +138,19 @@ static inline bool
>>> drm_exec_is_contended(struct drm_exec *exec)
>>>     return !!exec->contended;
>>>  }
>>>  
>>> +/**
>>> + * drm_exec_retry() - Unconditionally restart the loop to grab all
>>> locks.
>>> + * @exec: drm_exec object
>>> + *
>>> + * Unconditionally retry the loop to lock all objects. For
>>> consistency,
>>> + * the exec object needs to be newly initialized or contended.
>>> + */
>>> +#define drm_exec_retry(_exec)                              \
>>> +   do {                                            \
>>> +           WARN_ON(!drm_exec_is_contended(_exec)); \
>>
>> This warning would trigger!
>>
>> See the code in xe_bo_notifier_prepare_pinned() for example:
>>
>>                         drm_exec_retry_on_contention(&exec);
>>                         ret = PTR_ERR(backup);
>>                         xe_validation_retry_on_oom(&ctx, &ret);
>>
>> Without contention we would just skip the loop and never lock
>> anything.
>>
>> What XE does here just doesn't work as far as I can see.
> 
> So if the xe_validation_retry_on_oom() is actually retrying it
> internally call drm_exec_fini() and drm_exec_init() first, which means
> that the warning doesn't trigger, due to the dummy value of contended.

Ah! Yeah that information was missing.

I'm really wondering if the calls to drm_exec_fini()/drm_exec_init() should be 
part of the drm_exec_retry() handling.

Otherwise that is kind of easy to mess up.

Regards,
Christian.

> 
> So the warning does its job, and xe is safe.
> 
> Thanks,
> Thomas
> 
> 
> 
>>
>> Regards,
>> Christian.
>>
>>> +           goto *__drm_exec_retry_ptr;             \
>>> +   } while (0)
>>> +
>>>  void drm_exec_init(struct drm_exec *exec, u32 flags, unsigned nr);
>>>  void drm_exec_fini(struct drm_exec *exec);
>>>  bool drm_exec_cleanup(struct drm_exec *exec);

Reply via email to