Ah, yes. That makes a bit more sense, in this case the patch is indeed 
necessary.

But why is XE hard coding MAX_SCHEDULE_TIMEOUT here while i915 isn't?

Regards,
Christian

On 04.07.25 15:05, Aakash Deep Sarkar wrote:
>> And as far as I can see i915 is using the config option 
>> DRM_I915_FENCE_TIMEOUT here. So as long as you don't set this to very very 
>> large value this should work as expected.
> 
> Sorry I forgot to mention this in the commit message; this issue is happening 
> in the xe driver. DRM_I915_FENCE_TIMEOUT config is used only in the case of 
> i915 driver. For xe it's hardcoded to MAX_SCHEDULE_TIMEOUT in the 
> compat-i915-headers layer.
> 
> Regards,
> Aakash
> 
> -----Original Message-----
> From: Christian König <christian.koe...@amd.com> 
> Sent: 04 July 2025 17:57
> To: Jani Nikula <jani.nik...@linux.intel.com>; Aakash Deep Sarkar 
> <aakash.deep.sar...@intel.com>; intel-gfx@lists.freedesktop.org
> Cc: Badrappan, Jeevaka <jeevaka.badrap...@intel.com>; Ville Syrjälä 
> <ville.syrj...@linux.intel.com>; Maarten Lankhorst 
> <maarten.lankho...@linux.intel.com>; Hogander, Jouni 
> <jouni.hogan...@intel.com>; Alex Deucher <alexander.deuc...@amd.com>; Lucas 
> Stach <l.st...@pengutronix.de>; Rob Clark <robin.cl...@oss.qualcomm.com>; 
> Thierry Reding <thierry.red...@gmail.com>; Julia Lawall 
> <julia.law...@inria.fr>; Dan Carpenter <dan.carpen...@linaro.org>
> Subject: Re: [PATCH] drm/i915/display: Change ret value type from int to long
> 
> On 04.07.25 14:00, Jani Nikula wrote:
>> On Fri, 04 Jul 2025, Aakash Deep Sarkar <aakash.deep.sar...@intel.com> wrote:
>>> dma_fence_wait_timeout returns a long type but the driver is only 
>>> using the lower 32 bits of the retval and discarding the upper 32 
>>> bits.
>>>
>>> This is particularly problematic if there are already signalled or 
>>> stub fences on some of the hw planes. In this case the 
>>> dma_fence_wait_timeout function will immediately return with timeout 
>>> value MAX_SCHEDULE_TIMEOUT (0x7fffffffffffffff) since the fence is 
>>> already signalled.
> 
> That is not correct. If the fence is signaled dma_fence_wait_timeout() 
> returns the remaining timeout or at least 1 if the input timeout was 0.
> 
> Could be that i915 has a separately implemented fence_ops->wait callback 
> which incorrectly returns MAX_SCHEDULE_TIMEOUT, but i strongly doubt that 
> because that would break tons of stuff.
> 
>> If the driver only uses the lower
>>> 32 bits of this return value then it'll interpret it as an error code 
>>> (0xFFFFFFFF or (-1)) and skip the wait on the remaining fences.
>>>
>>> This issue was first observed with the Android compositor where the 
>>> GPU composited layer was not properly waited on when there were stub 
>>> fences in other overlay planes resulting in significant visual 
>>> artifacts.
>>
>> Thanks for the patch, good catch!
> 
> If I'm not completely mistaken this patch is not necessary.
> 
> dma_fence_wait_timeout() does *not* return MAX_SCHEDULE_TIMEOUT when the 
> fence is signaled, but rather the remaining timeout which is the input value 
> at maximum.
> 
> So when the input timeout fits into 32bits the returned timeout also fits 
> into 32bits as well.
> 
> And as far as I can see i915 is using the config option 
> DRM_I915_FENCE_TIMEOUT here. So as long as you don't set this to very very 
> large value this should work as expected.
> 
>>> Test: No graphical artifacts with shadertoy
>>
>> We've never used this commit trailer before, please let's not start now.
>>
>> The subject should plainly state the "what", and the commit message 
>> should answer the "why". You do have that here, but I think the 
>> subject is still a bit too much nuts and bolts.
>>
>> For example,
>>
>>      drm/i915/display: Fix dma_fence_wait_timeout() return value handling
>>
>> would state the "what" in *much* more helpful terms for anyone looking 
>> at git logs.
>>
>> Presumably this has been an error for some time. You should do a 
>> little bit of git blame on the offending lines. It'll find commit 
>> d59cf7bb73f3
>> ("drm/i915/display: Use dma_fence interfaces instead of i915_sw_fence").
>>
>> Based on that, we should add:
>>
>> Fixes: d59cf7bb73f3 ("drm/i915/display: Use dma_fence interfaces 
>> instead of i915_sw_fence")
>> Cc: Ville Syrjälä <ville.syrj...@linux.intel.com>
>> Cc: Maarten Lankhorst <maarten.lankho...@linux.intel.com>
>> Cc: Jouni Högander <jouni.hogan...@intel.com>
>> Cc: <sta...@vger.kernel.org> # v6.8+
>>
>> Then it occurs to me this looks like a common mistake to make. A 
>> little bit of git grep on dma_fence_wait_timeout() quickly finds 
>> multiple similar mistakes in drm, at least amdgpu, etnaviv, msm, and 
>> tegra. Cc some maintainers FYI. This class of bugs could cause issues 
>> elsewhere.
> 
> I can only guess but I think all those use cases use a fixed small timeout as 
> well. IIRC amdgpu uses a timeout in the millisecond range.
> 
>> Let's also Cc Julia and Dan in case they have ideas to improve static 
>> analysis to catch this class of bugs. Or maybe one exists already, but 
>> we're just not running them!
> 
> Yeah, agree. A script which checks if the input timeout could be >32bit and 
> the return value is assigned to something <=32bits is probably a really good 
> idea.
> 
> Regards,
> Christian.
> 
>>
>> Finally, for the actual change,
>>
>> Reviewed-by: Jani Nikula <jani.nik...@intel.com>
>>
>>
>>> Signed-off-by: Aakash Deep Sarkar <aakash.deep.sar...@intel.com>
>>> ---
>>>  drivers/gpu/drm/i915/display/intel_display.c | 3 ++-
>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/display/intel_display.c 
>>> b/drivers/gpu/drm/i915/display/intel_display.c
>>> index 456fc4b04cda..7035c1fc9033 100644
>>> --- a/drivers/gpu/drm/i915/display/intel_display.c
>>> +++ b/drivers/gpu/drm/i915/display/intel_display.c
>>> @@ -7092,7 +7092,8 @@ static void intel_atomic_commit_fence_wait(struct 
>>> intel_atomic_state *intel_stat
>>>     struct drm_i915_private *i915 = to_i915(intel_state->base.dev);
>>>     struct drm_plane *plane;
>>>     struct drm_plane_state *new_plane_state;
>>> -   int ret, i;
>>> +   long ret;
>>> +   int i;
>>>  
>>>     for_each_new_plane_in_state(&intel_state->base, plane, new_plane_state, 
>>> i) {
>>>             if (new_plane_state->fence) {
>>
> 

Reply via email to